Now that we’ve made great progress on content quality, which is the most important thing, here’s the problem: how to improve the user experience of the content. It will surely come as no surprise that the solutions documentation is authored in Word. You can get a minimally viable PDF from Word, but that’s about it.
I dream of the time when there is such a nice DITA authoring interface that they could create DITA topics, but that time isn’t now. These authors are more than casual authors, but writing and content management isn’t close to 100% of their time either. As such, authoring and reviewing in Word is a requirement.
At the same time, we already have a refined process to publish to our support portal in a searchable fashion. This process is based on DITA to HTML.
You can see where I’m going. How to get from Word to DITA so we can use our existing publishing pipeline?
The first solution that comes to mind is the DITA4Publishers Word2DITA plugin.
Installation
The instructions here seem to be out of date. The good news is that the reality is easier.
- Install the DITA4Publishers plugins.
- Copy the sample file from GitHub to the samples folder under DITA-OT
- Run the transform.
ant -f build.xml -Dtranstype=word2dita -Dargs.input=word2dita_single_doc_to_map_and_topics_01.docx
In the out folder you’ll see a map and topics. This is a sample document with the default style-to-tag mapping.
Images
Unfortunately, images are not extracted. The solution (found here) is to open the Word file in oXygen and extract the media folder to the topics folder created by word2dita. This is a bitter disappointment–I had envisioned a single build target that would convert the Word file to DITA and then to PDF, HTML, and ePUB. There will have to be a manual step in there to extract the images.
But then I put 2 and 2 together: the DOCX file is a ZIP, and Ant has an unzip task. So I added these lines to the target:
<unzip src="${args.input}" dest="${temp}" /> <copy todir="${out}/topics/media" failonerror="false"> <fileset dir="${temp}/word/media" /> </copy>
Now the DITA output is complete.