Generate PDF with Flying Saucer and iText by Adding Font Files

In this post, I would like to show how to load font files and format specific sections of the document at run time, when generating PDF document using Flying Saucer and iText libraries.

I will create a String that has a structure of valid XHTML file. The content of String is what going to be generated as a PDF document. Flying Saucer knows how to render XHTML. Before generating the PDF, I will load a font file for future formatting. Once the font file is loaded, I will retrieve the font family and will apply it as a formatting style to the selected paragraph of my future PDF document.

import java.io.*;
import com.lowagie.text.pdf.*;
import org.xhtmlrenderer.pdf.*;

public class TestFont {

	public static void main(String[] args) {
	try  {
		ITextRenderer renderer =  new ITextRenderer();
		File fontDir = new File(SOME_ABSOLUTE_PATH_TO_YOUR_FONT_DIR);

		//Build valid XHTML source for parsing
		StringBuffer buf = new StringBuffer();
		buf.append("<html>");
		buf.append("<head>");
		buf.append("</head>");
		buf.append("<body>");

		String body = "This is formatted paragraph";

		//Gets TTF or OTF font file from
		//the font directory
		if (fontDir.isDirectory()) {

			//Only add fonts with specific extensions
			File[] files = fontDir.listFiles( new FilenameFilter() {
			public boolean accept(File dir, String name) {
				String lower = name.toLowerCase();
				//Load TTF or OTF files
				return lower.endsWith(".otf") || lower.endsWith(".ttf");
				}
			});

				if (files.length > 0) {
					String fontFamilyName = "";
					//You should always embed TrueType fonts.
					renderer.getFontResolver().addFont(files[0].getAbsolutePath(),
BaseFont.IDENTITY_H, BaseFont.EMBEDDED);

					//Get font family name from the BaseFont object.
					//All this work just to get font family name
					BaseFont font = BaseFont.createFont(files[0].getAbsolutePath(),
BaseFont.IDENTITY_H , BaseFont.NOT_EMBEDDED);
					fontFamilyName = TrueTypeUtil.
					getFamilyName(font);

					if (!fontFamilyName.equals("")) {
					//Wrap DIV with font family name around the content
						body = "<div style="font-family: " + fontFamilyName + ";">" + body + "</div>";
					}
				}
		}

		buf.append("<p>This paragraph is unformatted</p>");
		buf.append("<p>" + body + "</p>");
		buf.append("</body>");
		buf.append("</html>");

		byte[] bytes = buf.toString().getBytes("UTF-8");

		ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
		DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
		InputSource is = new InputSource(bais);
		Document doc = builder.parse(is);

		renderer.setDocument(doc, null);
		renderer.layout();

		String filename = "document.pdf";
		BufferedOutputStream bufferedOutput = new BufferedOutputStream(new
FileOutputStream(filename));

		renderer.createPDF(bufferedOutput);
		bufferedOutput.flush();
		bufferedOutput.close();
		}
		catch (Exception e)
		{
			System.out.println(e.getMessage());
		}
	}
}

I hope by looking at the source code, the concept of how to add fonts and retrieve font family was clear.

Regards,
Alex

Online PDF Generator

Online PDF generator now available. This simple but useful online PDF generator tool allows you to generate PDF document online from HTML snippets. The tool uses Flying Saucer library.

The PDF is generated with full compression, including meta data in PDF properties. Keep in mind that for successful PDF generation, your HTML markup has to validate as XHTML 1.0 Transitional, which means <img> becomes <img /> or <p> must be terminated by the matching </p>.

You can provide URL to your own CSS file with styles for your HTML snippet or to provide a “style” attribute on HTML elements. If URL or “style” attribute are not specified, default system styles will be applied on your HTML snippet.

I hope this helps.

Cheers

Export Pebble Blog Entry to PDF Plugin

In one of my previous posts, I described how I implemented a plug in for Pebble blogging software that allows export of blog entries to PDF.

Today, I have made modifications to enable plug in to load font files at run time during PDF generation to provide support for additional non-Latin languages. So now, it is possible to export non-Lain characters from blog entries to PDF.

Below, I’ve added some content using different languages. To allow support for non-Latin languages, I am using font file cyberbit.ttf by Bitstream. You can test the plugin’s multilingual support by generating PDF from the current post.

I think its comes out quite nicely. The only hitch at this moment is that Flying Saucer does not have support for right-to-left text in PDF yet.

Japanese Kanji:
五輪代表

Japanese Hiragana
こんにちは、これは真実を決定するためにはテストテキストです。

Japanese Katakana
ラドクリフ、マラソン

Japanese Kokuji
和製漢字

Chinese Simplified:
您好,这是一个测试文本,以确定事实真相

Chinese Traditional:
您好,這是一個測試文本,以確定事實真相

Korean:
안녕하세요,이 사실을 확인하는 테스트 텍스트입니다

Arabic:
مرحبا ، هذا هو اختبار لتحديد نص الحقيقة

Hebrew:
שלום, זוהי בדיקה טקסט כדי לקבוע את האמת

Russian:
Привет, это тест текста, чтобы определить истину

Greek:
Γεια σας, αυτό το κείμενο είναι μια δοκιμασία για τον προσδιορισμό της αλήθειας

Thai:
สวัสดีนี่คือการทดสอบข้อความเพื่อตรวจสอบความจริง

Vietnamese:
Xin chào, đây là một bài kiểm tra văn bản để xác định sự thật

Turkish:
Merhaba, bu gerçeği belirlemek için bir test metin

Export to PDF using iText and Flying Saucer

In my previous post I attempted to generate PDF on the fly using iText library. My goal was to parse HTML snippet into PDF. Unfortunately, as I discovered iText alone is not powerful enough as HTML parser. iText is not flexible enough to manipulate the CSS. Its understandable, since iText‘s main functionality is PDF generation and not HTML parsing.

While trying to find workaround iText limitations, I came across Flying Saucer Java library. Flying Saucer is XML/XHTML/CSS 2.1 renderer, that uses iText and allows to render CSS stylesheets and XHTML, either static or generated, directly to PDFs.

I want to say that Flying Saucer does a beautiful job. You can check this out by trying to export current post to PDF :)

Joshua Marinacci, the Flying Saucer project lead wrote a nice tutorial that explains how to generate PDF using Flying Saucer.

Export to PDF Using iText Java-PDF Library

I had some time during this weekend, so I used iText, free Java-PDF library to make a plug in for Pebble blogging software. This plug in now allows to export blog entries to PDF document.

I liked this library, except one thing – converting HTML snippets to PDF. The library allows you to set styles to HTML tags during export.

The conversion is done with the help of HTMLWorker class. It is also possible to assign different styles to tags supported by HTMLWorker:

ol ul li a pre font span br p div body table td th tr i b u sub sup em
strong s strike h1 h2 h3 h4 h5 h6 img

Unfortunately there isn’t much documentation on what you can do for styles. So after poking through the source code, and going through iText mailing lists for examples, my results were a bit disappointing.

The PDF export works fine, except the case when blog entry has images. In that case, images exported to PDF having text overlaying on top of them.

I am hoping, that some of the people who had done a lot of work in the past using iText, will be able to share their experience.

Recent update:
In my later post, I talk about Flying Saucer Java library, which is XML/XHTML/CSS 2.1 renderer, that uses iText and allows to render CSS stylesheets and XHTML, either static or generated, directly to PDFs.