Setting up WKTHMLTOPDF on Ubuntu Server 10.04

Avadhut Phatarpekar bio photo By Avadhut Phatarpekar Comment

Generating PDFs is, I feel, is the most challenging part of developing any web application. In our particular case, this was made even more difficult considering the challenges that lay ahead of us:

  • Several languages: Our PDF generation engine would need to support Chinese, Japanese, Korean, German, Portugese, and even some RTL languages.
  • Layouts: Our invoice layouts were created in HTML and were quite crazy. Some even had styling performed by post-load JS. ::gulp::
  • Tables: They should break gracefully with page-breaks.
  • Complex headers/footers: Images, page numbers, etc.
  • Fast: Goes without saying, right?

In the past, we had experimented with several engines such as PrinceXML, but it was tad bit expensive.

One of my colleagues at work, Bharat Mhaskar, evaluated WKHTMLTOPDF and realized that with the QT patch, it would be able to render accurate PDFs given semantically correct HTML and JS. He spent days on perfecting the method to install it on Ubuntu 10.04 server with the QT patch and it is something that we’ve been using extensively in our apps. Below, I am outlining the installation procedure, possibly saving several people out there copious amounts of time:

  1. Install xvfb to run wkhtmltopdf headless:
sudo apt-get  install xvfb
  1. Install libraries that are required:
sudo aptitude install openssl build-essential xorg libssl-dev libxrender-dev
sudo apt-get build-dep qt4-x11
  1. Install required fonts:
sudo apt-get install msttcorefonts
sudo apt-get install ttf-ipafont-*

If this gives error then make sure that universe and multiverse is enabled in your apt-get repository.

  1. Install git source of wkhtmltopdf:
sudo aptitude install git-core
git clone git://github.com/antialize/wkhtmltopdf.git wkhtmltopdf
git clone git://gitorious.org/+wkhtml2pdf/qt/wkhtmltopdf-qt.git wkhtmltopdf-qt
cd wkhtmltopdf-qt
git checkout staging
cat ../wkhtmltopdf/static_qt_conf_base ../wkhtmltopdf/static_qt_conf_linux | sed -re 's/#.*//'
cd wkhtmltopdf-qt
./configure -nomake tools,examples,demos,docs,translations -opensource -prefix "../wkqt"
make -j3 && make install
cd ..
cd wkhtmltopdf
../wkqt/bin/qmake
make && make install
  1. Install PDFtk. This is an optional step; the library helps stitch together several PDFs.
sudo apt-get install pdftk
  1. Install actual wkpdftohtml:
wget http://wkhtmltopdf.googlecode.com/files/wkhtmltopdf-0.9.9-static-amd64.tar.bz2 
tar xvjf wkhtmltopdf-0.9.9-static-amd64.tar.bz2
mv wkhtmltopdf-amd64 /usr/local/bin/wkhtmltopdf
chmod +x /usr/local/bin/wkhtmltopdf

Reference: https://github.com/jdpace/PDFKit/wiki/Installing-WKHTMLTOPDF

  1. Install whatever fonts you need. If you have an Ubuntu desktop machine, it is even simpler. Install locales using the GUI on your desktop machine. This will also install the required fonts. Then, copy these to the target machine:
scp /usr/share/fonts/ user@PDFGenerator_TARGET_MACHINE/usr/share/fonts
  1. Rebuild font cache:
fc-cache -fv
  1. Test wkhtmltopdf:
xvfb-run -a -s "-screen 0 640x480x16" wkhtmltopdf --dpi 200  --page-size A4 --margin-top 150  http://www.editage.kr korea.pdf

Seems like a longish process but it works wonders. For a comprehensive list of features/options that are possible, see the WKTMLTOPDF manual.

We deployed our PDF generation machine onto an Amazon EC2 instance and developed an API so that all our apps can use the same PDF generation engine. We are also rolling out our own Drupal module that wraps this functionality and can be used by most site builders as a drop-in—all current PDF conversion module for Drupal require than an additional library be installed. This, hopefully, will allow Drupallers to roll out PDF conversion capabilities on their sites with minimal effort. :-)