Reporting on your repository with PowerShell, part 2

A couple months ago, some developers and support engineers were looking over some documentation and said to me, “These procedures are too complicated!” To which I said, “I know! I made them as simple as possible, but I can do only so much within the constraints of the interface.” Then the engineers asked me an astounding question: “Can you give us some complexity measure on each procedure so we know where to start making things simpler?” Because I knew how to use PowerShell to get information out of my set of DITA topics, I calmly said, “Let me look into it” while inside I was bursting with excitement.

Number of steps is of course a good place to start, but it doesn’t tell the whole story. Different types of notes, environment-specific information that the user has to type, commands, and the number of different interfaces required all play a part. Most of these I can map to DITA elements, even if imperfectly. The goal is just to get some rough measures for engineering prioritization.

The overall flow is straightforward:

Define the metrics as key-value pairs, where the key is the metric name and the value is an XPath expression.
Iterate over each task topic in the repository (as described in myearlier post) and count the number of occurrences of each XPath match.
Output the topic title, aggregate score, and individual metric scores as CSV to import into Google Docs.

I set up the metrics like this. (The list below is a partial set for the sake of brevity.)

$metrics = @{
  'Steps' = '//step[not(substeps)] | //substep'
  'root commands' = '//step//codeblock[starts-with(text(), "#")] |
      //step//codeblock[contains(text(), "sudo")]'
  'Non-root commands' = '//step//codeblock[starts-with(text(), "$")]'
  'GUI screens/menus' = '//uicontrol'
  'Notes' = '//note'
  'User-supplied parameters' = '//varname'
}

Then inside the loop that gets all task topics, each XPath expression is applied to each task and the result recorded on a per-topic basis, again as a hash.

$title = $fileContent.SelectSingleNode("//task/title").get_InnerText()
$key = "$title ($fileRel)"

$score = 0
$metrics.GetEnumerator() | Foreach-Object {
   $xpath = $_.Value
   $metricName = ($_.Key | Out-String).Trim()
   $metricValue = $fileContent.SelectNodes($xpath).Count
   $score += $metricValue
   $fileDB.Add($metricName, $metricValue)
}
$metricDB.Add($key, $fileDB)

And finally, output the contents of the metricDB hash as a CSV.

$metricDB.GetEnumerator() |
    ForEach-Object {
        $key = ($_.Key | Out-String).Trim()
        $val = $_.Value
        $metricValues = $val.GetEnumerator() | Sort-Object Name |
            Foreach-Object {
                $value = ($_.Value | Out-String).Trim()
                $total += $value
                $value
            }
        $metricValues = [string]::join(",", $metricValues)
        write-output "$key,$total,$metricValues"
    }

As you can see, the score for each topic is simply the sum of all the other metrics.

Sample output:

Title,Total,Branches,Cautions,Choice points,Dangers,GUI screens/menus,Interface
 switches,nCLI commands,Non-root commands,Notes,root commands,Steps,Typed text,
User-supplied parameters,Warnings
To Configure a Host IP Address (ip_config\t_reconfigure_a_host_ip_address.dita)
,39,0,0,0,0,26,0,0,0,0,0,13,0,0,0

Then I uploaded the resulting CSV to Google Docs, sorted by the “Total” column, and let the engineers take a look.

It was clear that certain procedures were unusually complex, which gave areas for focus. In a few cases, they explained to me how it could be written more simply, which I was glad to do. In others, they saw that it needed engineering work. Now, when the next release comes out, I can calculate the complexity again and demonstrate that the procedures have become simpler.

5 thoughts on “Reporting on your repository with PowerShell, part 2”

That’s really cool. Could you run it against the OT’s userguide bookmap and see what kind of results you get? It would be great to see a formatted example.

Would also be fun to add this to the QA report.

Reply ↓

Ruth on October 8, 2012 at 10:14 pm said:

These functions work fine as long as the hash table cntnaios simple objects. This won’t work with nested hash tables or more complex values like a WMI object. For that I would need to use XML to properly handle the object. Guess I’ll have to take this another step. Still, I think most people use hash tables with pretty simple values so hopefully what I have here will meet most requirements.

Reply ↓

I think the technique is generalizable, but what “complexity” means will differ from situation to situation. Although steps/substeps will almost always be relevant and uicontrol would be relevant for software products, those in themselves wouldn’t be all that interesting. The other metrics I chose are specifically related to a Linux-based virtualization system. Number of interfaces is a pretty interesting metric from a usability standpoint–but the expressions that count interfaces wouldn’t make sense for anyone but me.

Conditional steps would also be a good general metric if there were some reliable way of testing for it. I think that conditional steps should have their own element, and not just a regular step/cmd that starts with “If”.

Reply ↓

Sumon on July 28, 2012 at 9:55 am said:

Alway good to have many different apohpacres.Yes size can be an issue with large hashes. Using a CSV would be fast and more compact. FOr an even more compace and useful approach try using an ADO Recordset saved as a Microsoft binary format persisted recordset’ file. This alos allows for fast sorts and queries and is a good technique to have in your toolbox.Jeff I never though of using aa CSV to persist objects. It is a very useful method which I am sure I will use.

Reply ↓

JV is right about the clixml being much more cneinnveot. It does have the downside of only being useful in Powershell, and the .xml file could be orders of magnitude larger than the .csv equivalent.I’ve done some of this sort of thing, and I’ll offer a possible alternative for the import:if ($_.type -and ($_.value -as $_.type)){ $hash[$_.key] = $_.value -as $_.type } else { write-warning Value $($_.value) failed cast as type $($_.type) for key $($_.key). Leaving value as string. $hash[$_.key] = $_.value }That has the advantage of also checking whether the value will cast to that type before it creates the key.

Reply ↓

Ditanauts

Exploring the greater DITA frontier

Reporting on your repository with PowerShell, part 2

5 thoughts on “Reporting on your repository with PowerShell, part 2”

Leave a Reply to bcolborn Cancel reply