The federal government produces reams of publications, ranging from the useful to the esoteric. Pick a topic, and in most cases youâll find a relevant government publication: for example, recent Times articles about presidential appointments draw on the Plum Book. Published annually by either the House or the Senate (the task alternates between committees), the Plum Book is a snapshot of appointments throughout the federal government.
The Plum Book is clearly a useful resource for reporters. But like many products of the Government Printing Office, its two main publication formats are print nd PDF. That means the digital version isnât particularly searchable, unless you count Ctrl-F as a legitimate search mechanism. And thatâs a shame, because the Plum Book is basically a long list of names, positions and salary information. Itâs data.
In previous years, extracting data from the Plum Book involved dumping out its contents as text, perhaps using xPDF or a similar utility, and then cleaning up the resulting file to properly align the columns. (The results can still be quite messy.) But this year, the G.P.O. also released a mobile version that permits users to browse the Plum Book and its data by branch, agency, position and other criteria. When I learned of this new version, I suspected it could help me free the Plum Book data from t! he bounds of its traditional formats. Hereâs how I used the mobile site to contribute to Times reporter Annie Lowreyâs story on the gender composition of President Obamaâs appointees.
The Plum Book mobile site was built with Backbone.js. A little exploration (with help from my colleague Jeremy Ashkenas, Backboneâs creator) revealed JSON data on the back end for each position. Hereâs an example:
{ "location":"Washington, DC", "id":1, "title":"Secretary", "expires":"", "branch":"Executive", "tenure":"", "agcy_name":"Department of Agriculture", "org_name":"Office of The Secretary", "pborg_seq":"6752", "pborg_managed_by":"6751", "org_order":"10", "name_of_incumbent":"Thomas James Vilsack", "type_of_appt":"Presidential Appointment with Senate Confirmation(PAS)", "pay_plan":"Executive Schedule(EX)", "pay":"I", "pb_order":"5" }
Using Ruby, I wrote a script that cycled through the positions and saved the JSON locally. From there, I generated friendlier files that Lowrey could use as she crafted her story. But the Plum Book data doesnât include the gender of appointees, so I found a Ruby library that attempts to identify gender based on a personâs first name. This worked for a large chunk of the data, and Lowrey and researcher Kitty Bennett were able to manually fill out the several hundred incomplete records.
This fairly simple combination of JSON, a Ruby script and a Ruby library produced results that were more useful â" not to mention much more quickly generated â" than anything the PDF could offer. The data became not only a significant part of Lowreyâs article, but also the foundation of Alicia Parlapianoâs graphic that illustrates the gender rati in 15 cabinet departments.
(Click to see the complete graphic)
Weâve released the original data, minus the gender, in JSON and YAML formats on Github so others can use it without having to pull down the data from the mobile site. (You can also download an Excel version.) Because the Plum Book is published only once each year, the data doesnât change between editions, so we look forward to supplementing and enhancing it further at the end of this year.
As our work with the ! Plum Book demonstrates, data can make a profound difference to an article, providing a more complete picture and dramatically reducing research time. If you do something interesting with our Github repo or decide to try your hand at transforming a traditional government publication into useful, accessible data, be sure to let us know in the comments.
No comments:
Post a Comment