Export to PDF using iText and Flying Saucer

In my previous post I attempted to generate PDF on the fly using iText library. My goal was to parse HTML snippet into PDF. Unfortunately, as I discovered iText alone is not powerful enough as HTML parser. iText is not flexible enough to manipulate the CSS. Its understandable, since iText‘s main functionality is PDF generation and not HTML parsing.

While trying to find workaround iText limitations, I came across Flying Saucer Java library. Flying Saucer is XML/XHTML/CSS 2.1 renderer, that uses iText and allows to render CSS stylesheets and XHTML, either static or generated, directly to PDFs.

I want to say that Flying Saucer does a beautiful job. You can check this out by trying to export current post to PDF :)

Joshua Marinacci, the Flying Saucer project lead wrote a nice tutorial that explains how to generate PDF using Flying Saucer.

Export to PDF Using iText Java-PDF Library

I had some time during this weekend, so I used iText, free Java-PDF library to make a plug in for Pebble blogging software. This plug in now allows to export blog entries to PDF document.

I liked this library, except one thing – converting HTML snippets to PDF. The library allows you to set styles to HTML tags during export.

The conversion is done with the help of HTMLWorker class. It is also possible to assign different styles to tags supported by HTMLWorker:

[html]
ol ul li a pre font span br p div body table td th tr i b u sub sup em
strong s strike h1 h2 h3 h4 h5 h6 img
[/html]

Unfortunately there isn’t much documentation on what you can do for styles. So after poking through the source code, and going through iText mailing lists for examples, my results were a bit disappointing.

The PDF export works fine, except the case when blog entry has images. In that case, images exported to PDF having text overlaying on top of them.

I am hoping, that some of the people who had done a lot of work in the past using iText, will be able to share their experience.

Recent update:
In my later post, I talk about Flying Saucer Java library, which is XML/XHTML/CSS 2.1 renderer, that uses iText and allows to render CSS stylesheets and XHTML, either static or generated, directly to PDFs.