[NTG-pdftex] JBIG2 News

8 Dec 2002

      Hi pdfTeX fans,

first excuse the long E-Mail. Here is some news on JBIG2 file inclusion
with pdfTeX. The newest experimental pdfTeX driver I have put on my
homepage. And also a PDF file with the datastream example from the JBIG2
standard.

Status of Experimental JBIG2 Driver
-----------------------------------

Multiple JBIG2 images from a given JBIG2 file can be selected, one per
call only, e. g.:

\pdfximage page 1 {foo.jb2}
\pdfximage page 2 {foo.jb2}
\pdfximage page 3 {foo.jb2}

In this case the page 0 object is stored only once. The case of optimum
JBIG2 compression would be to include ALL images from a given JBIG2
file. Giving page/width together does not work, see my other E-Mail.

The newest xpdf 2.0 can nicely display and print the PDF-file generated
from the full datastream example file (all three images!) from Annex H
of the JBIG2 draft. Wow, how did they do it?

But my Acrobat Reader ((R) by Adobe) x86 linux 5.0.5 Apr 25 2002
11:55:36 crashes already at page/image 2, saying just `Abgebrochen' :-(
So most likely I have done something wrong. But where?

Some Remarks on JBIG2 Multiple Image Inclusion in pdfTeX
--------------------------------------------------------

Including multiple images from the same JBIG2 file gives some conceptual
complications (similar problems might be known from PDF inclusion).

How the JBIG2 file is organized is shown by an example (hope my
understanding is right), where the digits denote segments for numbered
pages, EOP is end-of-page flag, EOF is end-of-file flag. My
understanding is, that a certain page N segment X requires all info from
any page 0 segment up to the segment number X. A JBIG file might look
like:

00010111111EOP(1)22222220022EOP(2)0333303EOP(3)EOF

So if one wants to include the image from page 1, one needs all the page
0 segments up to EOP(1). If one wants page 2 also, one needs additional
page 0 segments up to EOP(2).

But already when writing the first page, the required page 0 info is to
be written out as PDF object. The PDF definition seems to require, that
all page 0 info is collected in ONE PDF stream.

--- Or can one define a continuation stream in PDF? ---

This would mean, that for writing an additional, later page (e. g. page
2), this page has to be accompanied also by its page 0
information---which is a waste of space in the PDF file, as part of the
page 0 PDF info has been already written before.

What to do?

(1) Accompany any page N with all its page 0 information upto the EOP(N)
flag. This is straight-forward, but it increases the size of the PDF
file, as the same page information is included several times. So it
makes part of the JBIG2 compression advantages void.

(2) Scan once over the whole file and make one big page 0 object. Then
reference this for any included page. This gives relative small files if
multiple images from the same file are included. But this might give
increased PDF file size, if only one image from the JBIG2 file is used.

(3) Planning ahead (= knowing in advance), which images (e. g. up to
which page) will be included from a given JBIG2 file. Then the included
page 0 segment could be kept at a minimum. Just take everything up to
the maximum image number. Or utilize a real JBIG2 decoder (XPDF) to
cleverly decide which segments are really needed for the image subset.
But the decoder would have almost nothing to decode (the UNdecoded JBIG2
stream is included), it only would have to decide... :-)

How to remember that a page 0 is already written out for a given image?
It seems that once an image object is written out, its img_name(img) and
other structure info are forgotten. As a simple cludge, I remember it in
a static (fixed-size, booo!) string storage within the write_jbig2()
function. And _all_ page 0 information from the JBIG2 file is written,
see case (2) above.

Inclusion of multiple images is slow on the JBIG2 reading side, as the
JBIG2 file for any new image is always scanned fresh from the beginning.
There is no data structure optimizing this.

If one would know beforehand, which images in total to include from a
JBIG2 file, things could be optimized. If there were a TeX data
structure available to the C-side, telling about pages (e. g. page
2,3-5,12), one could include only the actually required page 0 info. If
the pages were sorted in ascending order, one could loop through the
pages within the write_jbig2() function with high JBIG2 reading speed.

All this is far for production use, obviously with errors, only for
experimentation. I didn't look yet into the xpdf sources (sorry it's
still too complicated for me). Anybody out there who can give me some
hint about above question marks, and what to do next?

Have fun!

Greetings Hartmut

------------------------------------------------------------------------
Dr.-Ing. Hartmut Henkel
In den Auwiesen 6, D-68723 Oftersheim, Germany
E-Mail: hartmut_henkel@gmx.de
http://www.circuitwizard.de
------------------------------------------------------------------------