Word2DITA Plugin (DITA4Publishers)

One of the groups my team supports is solutions engineering. This group figures out the best way to run 3rd-party applications on our platform. Although they are not writers, one of their primary deliverables is documentation, and a lot of it. My team provides editorial services throughout the entire lifecycle.

Now that we’ve made great progress on content quality, which is the most important thing, here’s the problem: how to improve the user experience of the content. It will surely come as no surprise that the solutions documentation is authored in Word. You can get a minimally viable PDF from Word, but that’s about it.

I dream of the time when there is such a nice DITA authoring interface that they could create DITA topics, but that time isn’t now. These authors are more than casual authors, but writing and content management isn’t close to 100% of their time either. As such, authoring and reviewing in Word is a requirement.

At the same time, we already have a refined process to publish to our support portal in a searchable fashion. This process is based on DITA to HTML.

You can see where I’m going. How to get from Word to DITA so we can use our existing publishing pipeline?

The first solution that comes to mind is the DITA4Publishers Word2DITA plugin.

Installation

The instructions here seem to be out of date. The good news is that the reality is easier.

Install the DITA4Publishers plugins.
Copy the sample file from GitHub to the samples folder under DITA-OT

Run the transform.

ant -f build.xml -Dtranstype=word2dita -Dargs.input=word2dita_single_doc_to_map_and_topics_01.docx

In the out folder you’ll see a map and topics. This is a sample document with the default style-to-tag mapping.

Images

Unfortunately, images are not extracted. The solution (found here) is to open the Word file in oXygen and extract the media folder to the topics folder created by word2dita. This is a bitter disappointment–I had envisioned a single build target that would convert the Word file to DITA and then to PDF, HTML, and ePUB. There will have to be a manual step in there to extract the images.

But then I put 2 and 2 together: the DOCX file is a ZIP, and Ant has an unzip task. So I added these lines to the target:

<unzip src="${args.input}" dest="${temp}" />
<copy todir="${out}/topics/media" failonerror="false">
  <fileset dir="${temp}/word/media" />
</copy>

Now the DITA output is complete.

Ditanauts

Exploring the greater DITA frontier

Word2DITA Plugin (DITA4Publishers)

Leave a Reply Cancel reply