Total Pageviews

Thursday, May 9, 2013

Cool Tools: pdf2html

A PDF does one thing very well: it presents an accurate image that can be viewed on just about any device. Unfortunately, PDFs also cause grief for anyone who wants to use the data they contain. Governments, in particular, have a habit of releasing PDFs when the information would be more useful and accessible as a spreadsheet. The tools for extracting text from PDFs can be flaky, but Lu Wang’s pdf2htmlEX project solves this problem. Pdf2htmlEX takes PDFs and converts them into HTML5 documents while preserving the layout and appearance of the original.

The examples are pretty impressive. Being able to treat a PDF as a first-class citizen of the web seems like a step in the right direction. For an introduction to the tool, check out the QuickStart page.



No comments:

Post a Comment