Total Pageviews

Friday, January 18, 2013

Freeing the Plum Book

The federal government produces reams of publications, ranging from the useful to the esoteric. Pick a topic, and in most cases you’ll find a relevant government publication: for example, recent Times articles about presidential appointments draw on the Plum Book. Published annually by either the House or the Senate (the task alternates between committees), the Plum Book is a snapshot of appointments throughout the federal government.

The Plum Book is clearly a useful resource for reporters. But like many products of the Government Printing Office, its two main publication formats are print nd PDF. That means the digital version isn’t particularly searchable, unless you count Ctrl-F as a legitimate search mechanism. And that’s a shame, because the Plum Book is basically a long list of names, positions and salary information. It’s data.

In previous years, extracting data from the Plum Book involved dumping out its contents as text, perhaps using xPDF or a similar utility, and then cleaning up the resulting file to properly align the columns. (The results can still be quite messy.) But this year, the G.P.O. also released a mobile version that permits users to browse the Plum Book and its data by branch, agency, position and other criteria. When I learned of this new version, I suspected it could help me free the Plum Book data from t! he bounds of its traditional formats. Here’s how I used the mobile site to contribute to Times reporter Annie Lowrey’s story on the gender composition of President Obama’s appointees.

The Plum Book mobile site was built with Backbone.js. A little exploration (with help from my colleague Jeremy Ashkenas, Backbone’s creator) revealed JSON data on the back end for each position. Here’s an example:

{     "location":"Washington, DC",     "id":1,     "title":"Secretary",     "expires":"",     "branch":"Executive",     "tenure":"",     "agcy_name":"Department of Agriculture",     "org_name":"Office of The Secretary",     "pborg_seq":"6752",     "pborg_managed_by":"6751",     "org_order":"10",     "name_of_incumbent":"Thomas James Vilsack",     "type_of_appt":"Presidential Appointment with Senate Confirmation(PAS)",     "pay_plan":"Executive Schedule(EX)",     "pay":"I",     "pb_order":"5"  }

Using Ruby, I wrote a script that cycled through the positions and saved the JSON locally. From there, I generated friendlier files that Lowrey could use as she crafted her story. But the Plum Book data doesn’t include the gender of appointees, so I found a Ruby library that attempts to identify gender based on a person’s first name. This worked for a large chunk of the data, and Lowrey and researcher Kitty Bennett were able to manually fill out the several hundred incomplete records.

This fairly simple combination of JSON, a Ruby script and a Ruby library produced results that were more useful â€" not to mention much more quickly generated â€" than anything the PDF could offer. The data became not only a significant part of Lowrey’s article, but also the foundation of Alicia Parlapiano’s graphic that illustrates the gender rati in 15 cabinet departments.

Alicia Parlapiano

(Click to see the complete graphic)

We’ve released the original data, minus the gender, in JSON and YAML formats on Github so others can use it without having to pull down the data from the mobile site. (You can also download an Excel version.) Because the Plum Book is published only once each year, the data doesn’t change between editions, so we look forward to supplementing and enhancing it further at the end of this year.

As our work with the ! Plum Book demonstrates, data can make a profound difference to an article, providing a more complete picture and dramatically reducing research time. If you do something interesting with our Github repo or decide to try your hand at transforming a traditional government publication into useful, accessible data, be sure to let us know in the comments.



No comments:

Post a Comment