Generating PDFs is, I feel, is the most challenging part of developing any web application. In our particular case, this was made even more difficult considering the challenges that lay ahead of us:
- Several languages: Our PDF generation engine would need to support Chinese, Japanese, Korean, German, Portugese, and even some RTL languages.
- Layouts: Our invoice layouts were created in HTML and were quite crazy. Some even had styling performed by post-load JS. ::gulp::
- Tables: They should break gracefully with page-breaks.
- Complex headers/footers: Images, page numbers, etc.
- Fast: Goes without saying, right?
In the past, we had experimented with several engines such as PrinceXML, but it was tad bit expensive.
One of my colleagues at work, Bharat Mhaskar, evaluated WKHTMLTOPDF and realized that with the QT patch, it would be able to render accurate PDFs given semantically correct HTML and JS. He spent days on perfecting the method to install it on Ubuntu 10.04 server with the QT patch and it is something that we’ve been using extensively in our apps. Below, I am outlining the installation procedure, possibly saving several people out there copious amounts of time:
- Install xvfb to run wkhtmltopdf headless:
- Install libraries that are required:
- Install required fonts:
If this gives error then make sure that universe and multiverse is enabled in your apt-get repository.
- Install git source of wkhtmltopdf:
- Install PDFtk. This is an optional step; the library helps stitch together several PDFs.
- Install actual wkpdftohtml:
Reference: https://github.com/jdpace/PDFKit/wiki/Installing-WKHTMLTOPDF
- Install whatever fonts you need. If you have an Ubuntu desktop machine, it is even simpler. Install locales using the GUI on your desktop machine. This will also install the required fonts. Then, copy these to the target machine:
- Rebuild font cache:
- Test wkhtmltopdf:
Seems like a longish process but it works wonders. For a comprehensive list of features/options that are possible, see the WKTMLTOPDF manual.
We deployed our PDF generation machine onto an Amazon EC2 instance and developed an API so that all our apps can use the same PDF generation engine. We are also rolling out our own Drupal module that wraps this functionality and can be used by most site builders as a drop-in—all current PDF conversion module for Drupal require than an additional library be installed. This, hopefully, will allow Drupallers to roll out PDF conversion capabilities on their sites with minimal effort. :-)