Back

PdfInclude: National characters in base fonts of PDF

Implement national characters using embedded fonts in PDF

This document briefly describes implementing national characters support in PDF files from Progress 4GL using a tool PDF Include. However a general idea concerns PDF format and might be implemented in any other tool that generates PDF.

The problem

In many countries people use some extra characters - not just basic Latin alphabet. For example in Poland we use 9 extra characters, each one in capital and small version, which makes 18 extra characters: .

It is possible to use True type font in PDF but embedding TT font add extra 300KB to final PDF document. If you want to generate a single page and send it to 100 000 customers by email then such difference is important.

The solution

There are some standard fonts implemented in PDF - so called: Standard Type 1 Fonts - like Times-Roman, Helvetica and Courier. I've discovered how to implement national characters using those standard fonts. Thanks to that it's possible to produce small PFD document avoiding 300KB overhead for embedded TT fonts.

Update!!! It's not working with Courier, just Times-Roman and Helvetica.

To change standard fonts you must put national characters definitions below line:
/BaseEncoding /WinAnsiEncoding

For Polish characters in win-1250 encoding it looks like:

/BaseEncoding /WinAnsiEncoding
/Differences [
143 /Zacute
159 /zacute
140 /Sacute
156 /sacute
202 /Eogonek
234 /eogonek
198 /Cacute
230 /cacute
209 /Nacute
241 /nacute
211 /Oacute
243 /oacute
163 /Lslash
179 /lslash
165 /Aogonek
185 /aogonek
175 /Zdotaccent
191 /zdotaccent
]

Syntax is:
ascii_code /glymph_name

It's enough to replace ASCII codes with proper national characters. Usually there's another problem with proportional fonts: Times-Roman and Helvetica. Replaced characters might have different width than original so your text might looks bad. To solve that you must provide width definition for standard fonts: it's not possible to define width for just few characters - at least I don't know how to do it.

Have a look at differences between two files:
pdf.pdf - regular PDF file
pdf_pl.pdf - the same file with replaced Polish characters
You might save them and use some kind of diff utility to find out differences as PDF is a text file.

Mind that it works in new versions of Acrobat Reader. As far as I know Times-Roman with replaced characters is handled properly from version 5, Helvetica from version 6.

Implementation in PDF Include

I've modified just one file to implement that idea: pdf_inc.p. Original version comes from PDFinclude 3.3.3. I've added two procedures: pdf_font_diff_width and pdf_set_base14_codepage. I've also modified /BaseEncoding section. Modifications are marked in comments with: "tj - added February 10, 2007".

I've also:
- commented out annoying message in pdf_set_GraphicX and pdf_set_GraphicY
- added a new parameter ColumnVerticalPadding used in modified pdftool.p

How to use it

Call pdf_set_base14_codepage like that:
RUN pdf_new_page("Spdf").
RUN pdf_set_base14_codepage("Spdf", SESSION:CPINTERNAL ).
Remember to use encoding from SESSION:CPINTERNAL because pdf_inc.p opens stream as "binary no-convert". You might have a look at the example pdf_pl_1250.p - this procedure creates pdf_pl.pdf mentioned above.

How to implement other languages

This is just implementation for Polish national characters. Have a closer look at pdf_set_base14_codepage procedure and extend "CASE pdfCodePage" with your encoding, creating proper substitutions in TT_pdf_diff records and setting proper character width using pdf_font_diff_width procedure.

Determining - so called - glymph names for your national characters might be difficult sometimes. My suggestion is:
- find out your encoding at http://www.kostis.net/charsets
- check ISO/UTF8 character name
- find that name on adobe.com, using advanced search in Google (works better than search engine on adobe.com)

For example: I want to find out glymph name for letter in win-1250 encoding
- I open this encoding: http://www.kostis.net/charsets/cp1250.htm
- ISO/UTF8 name is "LATIN CAPITAL LETTER A WITH OGONEK"
- I use Google: http://www.google.com/search?as_q=LATIN+CAPITAL+LETTER+A+WITH+OGONEK&as_sitesearch=adobe.com
- glymph name is "Aogonek" (mind that names are case sensitive)

Other resources

Search for PDFReference.pdf - about 1000 pages about PDF
All Polish national characters in all encodings

Enjoy!

Tomasz Judycki