DITA > HTML > JSON

At Information Development World 2015 some attendees expressed interest in the JSON documentation format that feeds my documentation portal.

Starting from DITA source, there is a series of two transformation:

  1. HTML2 from DITA4Publishers, which flattens the directory structure.
  2. A custom XSLT that reads the resulting index and creates nested structures representing the document.
Each topic in the map becomes a “document” element in the JSON that is made up of the following pieces:
Field Source
Title Topic title
ID Topic filename
Unique key Top-level document filename + topic filename
Ancestors List of ancestor topics at all levels
Summary* Topic shortdesc
Body Topic body
HREF Topic path + topic filename
Documents* List of sub-documents

The JSON created in stage 2 is loaded into MongoDB for rendering on the documentation portal. As the loader and the rest of the portal infrastructure was developed by the support tools team I can’t give any insight there except to say that cross-references and image links presented a bit of a challenge.

The XSLT (ditahtml2json.xsl) and a sample JSON (hierarchy.json) generated from the DITA-OT hierarchy.ditamap are available from GitHub. More background is available in the slides from the IDW presentation.

DITA to Word for SME review

At the TC Camp unconference earlier this year, I participated in the discussion on review processes and tools. I’ve been interested in this topic for a while (here’s an article from 6 years ago). The session was pretty sobering. There have been a lot of great ideas, and even some good tools, for conducting review over the past 10 or 12 years. But in the final analysis, it seems that no one had been successful implementing and sustaining any of these approaches. SMEs, in general, seem to not be willing to incur any overhead whatsoever in the quest after good reviews. Overhead means adopting any new tools, certainly, especially those that can’t be used offline, but also even having to give a moment’s thought to the process.

So how can information developers move past the ad hoc approach to reviewing? Clearly, we need to provide a familiar experience so SMEs don’t have to think about it. The unpleasant conclusion is that it should be based on a common authoring and review process such as Google Docs or (gasp!) Word+Sharepoint. Despite the contempt we feel for Word, my team is asked constantly to provide content in these forms for review. So, if you can’t beat ’em, join ’em.

Since my company is in the final stages of moving from Google to Office 365, we decided to look at Word as an option, even though in my opinion the Google Docs review experience is superior. The RTF transformation in the DITA Open Toolkit has been broken for as long as I can remember. In the past I’ve used HTML as an intermediate step for going from DITA to Word, but it has a lot of styling issues.

Then I found that Word does a pretty handy conversion from PDF. Just open the PDF from Word, and there it is. Although the formatting isn’t good enough for production use maybe, it’s plenty good enough for review. One gap is that cross-references aren’t preserved, but that’s not a show-stopper for review.

It’s hard to overstate the revulsion I feel in using PDF as an interchange format. How ridiculous is it to strip out all the hard-won semantics from the source content, then heuristically bring back a poor imitation of that information? But I’m no Randian hero, just a guy trying to get stuff done.

Automation

Ok, so it’s sort of interesting that you can convert PDF to Word just by opening it up and saving it, but that’s a manual process. What about automating the conversion?

Naturally PowerShell is the answer. The full script is a bit longer, but here are the important parts.

$wordApp = New-Object –ComObject Word.Application
$wordApp.Visible = $False

$pdfFile = (Resolve-Path $pdfFile).Path
$wordDoc = $wordApp.Documents.Open($pdfFile, $false) # do not confirm conversion, open read-only
Write-Host "Opened PDF file $pdfFile"

$wordFile = $pdfFile.Replace("pdf", "docx")
$wordDoc.SaveAs([ref]$wordFile,[ref]$SaveFormat::wdFormatDocument)
Write-Host "Saved Word file as $wordFile"

You’ll want to wrap opening the PDF file and saving the DOCX file in a try/catch block, but that’s about it. My version also adds a “Draft” watermark to the Word file.

Troubleshooting

One issue I found was that sometimes the file wouldn’t close properly. The next time I tried to convert the same file, a warning like this would pop up. The solution was to go to the Task Manager and kill WINWORD.EXE.

Next steps

Using this utility is a two-step process. The first thing I’d like to do is create an ant target that would generate a PDF from DITA source then immediately convert it to PDF.

Once that’s done, the desired end state would be to do regular automated builds and upload to Sharepoint. The trick will be to merge comments, but that seems to be possible with PowerShell. Although I haven’t fully investigated the approach described here, it looks promising.

Information Development World 2015

Last week I attended the Information Development World conference in San Jose. My presentation “Dynamic chunking of component-authored information” covered how we are presenting tech docs on my company’s support portal, how we do it, and why we’re doing it that way. Reception was favorable with interest from several companies and researchers. Representative from the DITA technical committee asked about adding a JSON transform to the Open Toolkit.

There were some interesting presentations so I thought I’d summarize the key takeaways. The full agenda is here: https://www.eiseverywhere.com/ehome/113382/schedule/

 

Unforgettable: The neuroscience of memorable content (Dr. Carmen Simon)

The user experience of interacting with information is shaped largely by expectations, and the information developer can take steps to shape the expectations. Expectations = Beliefs + Tools

 

A radical new way to control the English Language (Dr. George Gopen)

Expectations again: readers have fixed expectations about where to look for what in a text. Key insight is that the most important words in the sentence are expected at the end. (BC: The concept of information structure from linguistics leads to much the same conclusion.)

SciAm article from 1990 is recognized as one of 36 classic articles from the publication’s history. Columns for journal Litigation are short and digestible: http://georgegopen.com/articles/litigation/

 

Open authoring: Content collaboration across disciplines (Ralph Squillace, MS Azure)

MS Azure documentation practices enable broad collaboration using GitHub plus markdown. Docs undergo a freshness review and are updated or discarded every 3 months (BC: we should be able to get reports about potentially stale topics from Git). Key metrics: freshness, performance, satisfaction. They conduct periodic “hackadocs” with SMEs to create/update documentation.

 

DocOps (Wade Clements, CA)

Inspired by the DevOps approach. Move from trying to get content perfect before publishing to being able to make corrections/adjustments quickly, and work from data not anecdotes. Capture referrals from context-sensitive help in the UI to the docs. Metrics use case: predict if a user who comes to the docs ultimately opens a support case.

 

Work smarter not harder (Skip Besthoff, InboundWriter)

By analogy to botany: some pieces of content are annuals (ongoing use, long-term interest), some are perennials (one-time use). Focus on creating better content (annuals), not more content. The average cost of a piece of content in the enterprise is $900 (BC: according to marketing consultant Jay Baer; I think technical content cost more over its lifecycle; or maybe it’s $900/page-ish). Move from using simple keywords to topic clusters.

 

Going mapless (Don Day, founding chair of OASIS DITA technical committee)

Some use cases for DITA may not require maps for top-level navigation. To do so, robust search, tags/keywords, topic-to-topic cross-references are required. Mapless DITA was implemented in wikis for “The Language of” series from XML Press: http://tlowiki.com/ See also expeDITA: http://expedita.info/ (BC: I’m not ready to jettison maps…yet.)

 

Single-sourcing publishing across multiple formats (George Bina, oXygen)

Specifically, publishing from multiple input formats (such as Excel, CSV, markdown, SVG). Dynamic transformations to DITA. It’s actually real: https://github.com/oxygenxml/dita-glass

 

Past, present, and future of DITA (Kristen Eberlein, OASIS DITA technical committee)

DITA 1.3 spec is complete and will be officially released in mid-December. It includes several interesting new features: troubleshooting topic type, classification domain/map, SVG domain, doc release notes capability.

Reviews were done using DITAweb: http://about.ditaweb.com/

DITA 2.0 won’t be out for about 5 years. Plan is to include lightweight DITA.

 

cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |

Word2DITA Plugin (DITA4Publishers)

One of the groups my team supports is solutions engineering. This group figures out the best way to run 3rd-party applications on our platform. Although they are not writers, one of their primary deliverables is documentation, and a lot of it. My team provides editorial services throughout the entire lifecycle.

Now that we’ve made great progress on content quality, which is the most important thing, here’s the problem: how to improve the user experience of the content. It will surely come as no surprise that the solutions documentation is authored in Word. You can get a minimally viable PDF from Word, but that’s about it.

I dream of the time when there is such a nice DITA authoring interface that they could create DITA topics, but that time isn’t now. These authors are more than casual authors, but writing and content management isn’t close to 100% of their time either. As such, authoring and reviewing in Word is a requirement.

At the same time, we already have a refined process to publish to our support portal in a searchable fashion. This process is based on DITA to HTML.

You can see where I’m going. How to get from Word to DITA so we can use our existing publishing pipeline?

The first solution that comes to mind is the DITA4Publishers Word2DITA plugin.

Installation

The instructions here seem to be out of date. The good news is that the reality is easier.

  1. Install the DITA4Publishers plugins.
  2. Copy the sample file from GitHub to the samples folder under DITA-OT
  3. Run the transform.
    ant -f build.xml -Dtranstype=word2dita -Dargs.input=word2dita_single_doc_to_map_and_topics_01.docx

In the out folder you’ll see a map and topics. This is a sample document with the default style-to-tag mapping.

Images

Unfortunately, images are not extracted. The solution (found here) is to open the Word file in oXygen and extract the media folder to the topics folder created by word2dita. This is a bitter disappointment–I had envisioned a single build target that would convert the Word file to DITA and then to PDF, HTML, and ePUB. There will have to be a manual step in there to extract the images.

But then I put 2 and 2 together: the DOCX file is a ZIP, and Ant has an unzip task. So I added these lines to the target:

<unzip src="${args.input}" dest="${temp}" />
<copy todir="${out}/topics/media" failonerror="false">
  <fileset dir="${temp}/word/media" />
</copy>

Now the DITA output is complete.

cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |

QA Check Compiler

We’ve been working on some enhancements for the QA plugin that are now available. You can download the plugin from GitHub.

The first enhancement I want to talk about is the QA check compiler.

Writing a QA script in PowerShell was a pretty keen idea even if I do say so myself. Moving to an Open Toolkit plugin was an even better idea with better execution. One of the drawbacks to the OT mechanism, however, is how complicated the expression of a simple check is.

For example, let’s say you want to flag occurrences of utilize and suggest use instead. This is the expression you have to write:

<xsl:if test="descendant::*[not($excludes)]/text()[matches(.,'utilize', 'i')]">
  <data type="msg" outputclass="term mmstp" importance="recommended">Found "utilize". Use "use".</data>
</xsl:if>

The contents of the matches call and the value and attributes of the data element are all significant and also very repetitive. As we all know, repetition leads to errors.

Authoring Checks for use with the Compiler

With the QA check compiler, you author the checks in an abbreviated form. The checks go inside a properties table inside a DITA reference topic. To express the example rule above, just add a row to a properties table to specify the severity, expression, and message.

The QA compiler, executed by the compilechecks target, takes care of the converting the rows in the properties tables to checks that the plugin can execute.

  • The propdesc becomes the message for the check.
  • The propvalue becomes the argument to the matches function in the XPath expression.
  • The proptype becomes the @importance.
  • The @id of the parent properties table becomes the @outputclass of the check.

You can have as many properties tables as you want.  If the @id is term_mmstp the resulting category will be term mmstp. (Spaces aren’t allowed in @id, so an underscore is necessary but then replaced with a space in the output.) These categories are unconstrained–you can make them whatever you want.

The proptype element is limited to the values for @importance: default, deprecated, high, low, normal, obsolete, optional.

Enabling the QA Compiler

The result of the QA compiler isn’t enabled by default. To do so, uncomment the xsl:include call in xsl/qa_checks/_qa_checks.xsl and also remove the term template from that stylesheet. The QA compiler produces a template called term to make it easy to integrate, and you can’t have two templates with the same name. Once the result is included, you can start adding and modifying checks in tools/qacompiler/qa_checks_r.dita, which is a DITA reference topic. Don’t forget to run ant compilechecks after editing the DITA topic.

cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |
cheap football kits  |
cheap football shirts  |
cheap football tops  |

XML-aware diff with Git

One of the less-than-perfect aspects of using Git for XML is comparing versions of a file. Standard diff tools are not optimized for files that contain markup. Not only is the markup exposed, but irrelevant details (like indentation or line length) can appear far more significant than they really are. Although you can reduce the impact by telling the diff tool to ignore whitespace, such tools will never be semantically aware.

The Windows client TortoiseGit includes a graphical diff tool. If you select a revision of a file in the Git repository, you can diff it with previous or later versions. This is a convenient feature, but disappointing that the diff is not XML aware.

I just found out that oXygen includes a graphical diff tool called diffFiles.exe. It’s significant that it’s graphical because it can’t write output to the console. But I wondered if there is a way to have TortoiseGit use diffFiles rather than TortoiseDiff.

It turns out that there is. Go to TortoiseGit > Settings > Diff
Viewer and click Advanced. Create new entries for .dita and .xml setting the following (adjusting the file path as needed for your environment) as the Program:

 C:\Program Files\Oxygen XML Editor 16\diffFiles.exe %base %mine

Now when you tell TortoiseGit to compare DITA or other XML files it will use the oXygen XML-aware diff rather than TortoiseGitMerge.

There are couple of limitations. One is that you can’t use oXygen’s diff to do a 3-way merge, which can be useful if you have merge conflicts. However, I never do this with XML files. The other limitation is that oXygen diff takes much longer to start than TortoiseGitMerge. TortoiseGitMerge is almost instantaneous, while oXygen diff takes several seconds.

Flattening HTML output

My DITA repository has a number of subdirectories to keep maps and topics organized. This strategy is convenient, but it can be a drawback when I need to further process HTML output, as I had to do for a recent publishing project. The HTML output type in the DITA Open Toolkit retains the organization of the source files, so that every processing task turned into file tree navigation with tools that aren’t suited for it.

The DITA For Publishers HTML2 plugin provides a mechanism for flattening the output: the html2.file.organization.strategy Ant parameter. To make flattening a viable approach, there needs to be a provision for avoiding collisions. For example, say you have two directories, indir1 and indir2, each of which contain a topic file topic1.dita. The single-directory output, then, can’t be outdir/topic1.html because there are two topic1 files.

The plugin deals with this requirement by appending a string, created by generate-id(), to the file name. So indir1/topic1.dita would become outdir/topic1_d97.html and indir2/topic1.dita would become outdir/topic1_d84.html. The exact expression (in the get-result-topic-base-name-single-dir template) is

concat(relpath:getNamePart($topicUri), '_', generate-id(.))

While that’s a reasonable approach, it’s not the one that I want to use because the filenames ultimately get exposed to my customers. Since I don’t know the algorithm for generating the unique ID, it’s not deterministic enough, and a bookmarked link might become invalidated without my knowing. Instead, I’d like to prepend the parent directory name, so I modified the expression to this:

concat(relpath:getNamePart(relpath:getParent($topicUri)), '_', relpath:getNamePart($topicUri))

The result for the two example files then would be outdir/indir1_topic1.html and outdir/indir2_topic1.html. This approach has the added advantage that the output doesn’t lose information about its location in the source.

REST API documentation: From JSON to DITA (but skip the JS)

In our most recent release, a REST API was added to the product. Included in the API was some built-in documentation that is exposed in the user interface. Users can also send API calls and see the results. All this is done with Swagger and is a really nice way to get familiar with the API.

Then, of course, the request for a comprehensive API reference arrived. If you aren’t familiar with the API, it’s hard to get an overview and find what you’re looking for in the version presented in the UI. Or you may not have access to a system where you can poke around.

Since there is at least a representation of the API structure with definitions, there’s no way I was going to start from nothing or spend a lot of time copying and pasting. And since the built-in documentation was available as JSON I was sure that I’d be able to get from that to a document in a fairly straight line. Since all the other documentation is in DITA, using DITA as the target format that could then be processed in the normal manner seemed like the way to go.

Since JSON stands for “JavaScript Object Notation” I thought that using JavaScript would be the way to go for the initial pass. Boy was I wrong. Something about not having a DOM context when embedding JavaScript in Ant. So I could read the JSON files but not output them in XML.

After wasting enough time with JavaScript I checked into using (as a frequent reader of this blog, should such a person exist, would guess) PowerShell. After some digging around I found this blog post and my problems were pretty much solved. I use the Convert-JsonToXml function exactly as given in that post. Calling it is another simple matter:

$strJson =  Get-Content $inputFile

$xml = Convert-JsonToXml $strJson
Try {
    $xml.Get_OuterXml() | out-file $outputFile
    write-host "$outputFile"
} Catch {
    write-host "Could not save $outputFile"
}

The $inputFile variable is the JSON file and $outputFile is the same as $inputFile, just with .xml extension rather than .json.

A record in the API JSON file looks like this:

{
      "path": "/alerts/hardware",
      "operations": [{
        "method": "GET",
        "summary": "Get the list of hardware Alerts.",
        "notes": "Get the list of hardware Alerts generated in the cluster.",
        "type": "void",
        "nickname": "getHardwareAlerts",
        "parameters": [{
          "name": null,
          "description": "Filter criteria",
          "required": false,
          "allowMultiple": false,
          "type": "AlertRequestDTO",
          "paramType": "query"
        }],
        "responseMessages": [{
          "code": 500,
          "message": "Any internal exception while performing this operation"
        }]
      }]
    },

And the XML from PowerShell looks like this:

<item type="object">
  <path type="string">/alerts/hardware</path>
  <operations type="array">
    <item type="object">
      <method type="string">GET</method>
      <summary type="string">Get the list of hardware Alerts.</summary>
      <notes type="string">Get the list of hardware Alerts generated in the cluster.</notes>
      <type type="string">void</type>
      <nickname type="string">getHardwareAlerts</nickname>
      <parameters type="array">
        <item type="object">
          <name type="null" />
          <description type="string">Filter criteria</description>
          <required type="boolean">false</required>
          <allowMultiple type="boolean">false</allowMultiple>
          <type type="string">AlertRequestDTO</type>
          <paramType type="string">query</paramType>
        </item>
      </parameters>
      <responseMessages type="array">
        <item type="object">
          <code type="number">500</code>
          <message type="string">Any internal exception while performing this operation</message>
        </item>
      </responseMessages>
    </item>
  </operations>
</item>

So then I just need some XSLT to convert that into DITA, which is straightforward enough. The overall publishing pipeline is JSON through PowerShell to well-formed but non-validating XML through Ant/XSLT to DITA through the Open Toolkit to PDF and HTML and whatever else I might use in the future.

Here is the result. I was anxious that this was woefully inadequate API documentation, but after discussing with other attendees at the TC Camp unconference this weekend, I realized it’s not as deficient as I feared.

How to build a lot of maps a lot of times with consistent filtering

…and not go crazy.

The mechanism for applying conditions while building with the DITA Open Toolkit has vexed for me for a while. You have to specify the ditaval file at the time you initiate the build. That’s probably ok if you are working on 1 or 2 maps with 1 or 2 ditavals. I’m faced with over 50 maps that heavily reuse topics and over 10 ditaval files. Here’s the vexing part: every map is built with the same ditaval every time. I don’t want to have to manually specify the ditaval every time I build since it never changes (per map). And my target outputs are at least two. That makes the problem twice as bad.

Over 50 maps going to two outputs is, if my math is correct, something over 100 build events if I want to generate all my docs. There’s no way I’m going to do that manually. It’s virtually guaranteed that I’ll overlook some map or apply the wrong ditaval and that I’ll be driven to the brink of madness by the time the whole thing is done.

My feeling is that the ditaval is really a property of the map. So what I’d really like to do is specify the ditaval in the map and have a build routine that passes the info. This way I only have to specify the filtering set once, when the map is created, and not every infernal time I want output. Also I want to get out of the business of production some day, and there needs to be a repeatable process for doing all this.

Then it would be nice to be able to specify sets of maps to build based on wildcards. For a while I was trying to maintain lists of files as build targets inside Ant build files. But I already have lists: it’s files on a filesystem. That’s duplicated content. I hate duplicated content; it means mismatches. And anyway what I want to build is different from day to day. Maybe I need all admin docs, or all docs for a particular version, or hardware replacement docs for one platform, or hardware replacement docs for one component on all platforms, or maybe just one doc all by itself.

So those are the requirements: specify the ditaval in each map, and specify the maps based on wildcards in the filenames.

How to do this? For the first requirement, insert some type of metadata in the map. For the second, write a script that takes a set of files as an argument, reads the metadata from each map, then calls Ant. Given my past positive experience with PowerShell that’s what I’ll use.

The metadata part is easy enough. And while I’m at it I’ll specify the build targets as well.

<othermeta name="filter" content="build/external.ditaval" />
<othermeta name="targets" content="pdf epub" />

Now I need a script to make use of those elements.

The core is a loop over the items in the build set, which is specified as a command-line argument.

$buildSet = Get-ChildItem -recurse $buildSet
ForEach ($input in $buildSet) {

Then read in the metadata.

Try {
  $targets = [string]$fileContent.SelectSingleNode('/bookmap/bookmeta/othermeta[@name="targets"]/@content').get_InnerText()
  } Catch [system.exception] {
    $targets = $defaultTargets
    write-host "No build targets specified. Using default `"$targets`"."
  }

$targets = $targets -split " " | ForEach-Object {$_ = "`"$_`""; $_}

Try {
  $filter = [string]$fileContent.SelectSingleNode('/bookmap/bookmeta/othermeta[@name="filter"]/@content').get_InnerText()
  $filter = Join-Path -path $filePath -childpath $filter
} Catch [system.exception] {
  $filter = $defaultFilter
  write-host "No filter specified. Using default `"$filter`"."
}

That middle ugly line is because the targets passed to Ant have to be in double quotation marks. This punctuation doesn’t seem to be necessary in the legacy Windows shell but is in PowerShell (figuring this out cost me a distressing amount of time).

And finally call Ant.

Try {
  ant -f mybuild.xml "-Dargs.input=$input" "-Dargs.filter=$filter" "-Dargs.xhtml.toc=$fileName" $targets
} Catch [Exception] {
  write-host "Build failed."
}

The complete script is a bit longer of course because of the need to set up some defaults and manipulate paths but is still only 74 lines.

There are a few things that I need outside the script: a generic build file and an OT start script that I keep at the top level of my repository along with the script itself. These are all standard Open Toolkit requirements.

Some examples.

    • All admin docs for version 3.5:
      > ./build.ps1 -b maps_administration/*v3_5*.ditamap
    • Just the setup guide:
      > ./build.ps1 -b maps_administration/Setup_Guide-v3_5.ditamap
    • All hardware replacement docs for the 3000 product:
      > ./build.ps1 -b maps_hardware_replacement/*3000*.ditamap
    • Power supply replacement for all products:
      > ./build.ps1 -b maps_hardware_replacement/Power_Supply*.ditamap

Now I have the flexibility and repeatability I want.

Fixing part/chapter numbering in PDF

Apparently part and chapter numbering for bookmaps has been broken in the Open Toolkit PDF output since the beginning. Instead of numbering the chapters in a series from beginning to end, chapter numbering resets to 1 for every part so the TOC looks like this:

  • Part I
    • Chapter 1
    • Chapter 2
    • Chapter 3
  • Part II
    • Chapter 1
    • Chapter 2
    • Chapter 3
  • Part III
    • Chapter 1
    • Chapter 2
    • Chapter 3

The correct behavior would be a single chapter series that goes from 1 to 9. OT issue 1418 indicates that this was fixed in OT 1.7 but I didn’t see any change when I tried it out. Instead I changed my PDF plugin and thought I’d document the change here for anyone else who might need to do it in the future since I wasn’t able to find the answer anywhere.

There are two templates that need to be updated: one for the TOC and one for the chapter first page.

The template for the TOC is below.

<xsl:template match="*[contains(@class, ' bookmap/chapter ')] |
  *[contains(@class, ' boookmap/bookmap ')]/opentopic:map/*[contains(@class, ' map/topicref ')]" mode="tocPrefix" priority="-1">
  <xsl:call-template name="insertVariable">
    <xsl:with-param name="theVariableID" select="'Table of Contents Chapter'"/>
    <xsl:with-param name="theParameters">
        <number>
          <xsl:variable name="id" select="@id" />
          <xsl:variable name="topicChapters">
            <xsl:copy-of select="$map//*[contains(@class, ' bookmap/chapter ')]" />
          </xsl:variable>
          <xsl:variable name="chapterNumber">
            <xsl:number format="1"
              value="count($topicChapters/*[@id = $id]/preceding-sibling::*) + 1" />
          </xsl:variable>
          <xsl:value-of select="$chapterNumber" />
        </number>
    </xsl:with-param>
  </xsl:call-template>
</xsl:template>

The significant parts here are the $topicChapters and $chapterNumber variables.

The template for the chapter first page is insertChapterFirstpageStaticContent. It’s too long to reproduce in its entirety here, but the code is the same as what’s inside the number element in the TOC template. The number element that contains the $topicChapters and $chapterNumber variables needs to replace the one inside the xsl:when test="$type = 'chapter'" block.