Re: [pdftex] AR5 "Fit visible" bug

newer
Marking prereleases

older
Re: [NTG-pdftex] pdf inclusion...

Martin Schroeder

3 Dec 2002 3 Dec '02

3:02 a.m.

On 2002-05-26 02:02:48 +0200, Heiko Oberdiek wrote:

...

look at the following simple LaTeX file:

\documentclass[a4paper]{article} \begin{document} \section{Hello World} The begin of the line\hfill the end of the line \end{document}

run it through pdflatex and view the result in the new version 5 of AcrobatReader/Unix. With the view mode "Fit visible" the result is very poor. The visible area is misplaced by a large amount.

Because "Fit visible" is my favourite view mode, this misbehaviour would prevent me from using AcrobatReader 5. Therefore I experimented to detect the cause of the problem:

* If there are lines (in the header, box around the page), then the right margin is correct calculated by AR, but the left margin is zero. * A box without any text is correctly displayed.

The start of the pdf page stream:

stream 1 0 0 1 124.802 706.129 cm BT /F17 14.346 Tf 0 0 Td [(1)-1125(H)1(ell)1(o)-375(W)94(orld)]TJ ...

The problem seems, that AR does not take the current transfer matrix (CTM) into account, if it scans the first Td operator in a BT..ET block while calculating the visible area.

So the following fixes the problem in horizontal direction:

1 0 0 1 0 706.129 cm BT /F17 14.346 Tf 124.802 0 Td [(1)-1125(H)1(ell)1(o)-375(W)94(orld)]TJ ...

Some experiments (switching the view in AR) show that also the vertical component has to be fixed.

Therefore I have written a suggestion for a patch of pdfTeX, TeX/texk/web2c/pdftexdir/pdftex.ch:

The "cm" command before "BT" is written by pdf_begin_text in pdftexdir/pdftex.ch by pdf_set_origin. So the patch sets the current position to the lower left corner during the call of pdf_set_origin, so that the "cm" commands moves to 0,0 before "BT". Then the parameter values of the first "Td" will not be calculated as "0 0", but contain the distance to the lower left corner, so that AR5 is happy. There remains some obscure calculation errors, eg. the detection of the right margin, if there is only a few words, but for normal pages the AR5 should now usable in "Fit visible" view.

If the example above is compiled with dvips/ps2pdf or VTeX, then the pdf file shows no such displacement. So users will think, that pdfTeX is not able to generate code that can be viewed properly and not consider this as clear bug of AR5. Therefore I see no way to avoid such a fix.

Here my suggestion:

*** pdftex.ch.org Sun May 26 00:04:44 2002 --- pdftex.ch Sun May 26 00:04:53 2002 *************** *** 1519,1527 **** --- 1519,1534 ---- end;

procedure pdf_begin_text; {begin a text section} + var temp_cur_h, temp_cur_v: scaled; begin if not pdf_doing_text then begin + temp_cur_h := cur_h; + temp_cur_v := cur_v; + cur_h := 0; + cur_v := cur_page_height; pdf_set_origin; + cur_h := temp_cur_h; + cur_v := temp_cur_v; pdf_print_ln("BT"); pdf_doing_text := true; pdf_f := null_font;

I'm just scanning through my pdftex mailbox and found this. What became of it? It certainly isn't in the source yet. Best regards Martin -- Martin Schröder, MS@ArtCom-GmbH.DE ArtCom GmbH, Grazer Straße 8, D-28359 Bremen Voice +49 421 20419-44 / Fax +49 421 20419-10

Show replies by date

Hartmut Henkel

7 Dec 7 Dec

2:03 p.m.

New subject: \pdfximage page/width exclusive

Hi, \pdfximage page 1 {foo.pdf} or \pdfximage width 20mm {foo.pdf} works, but \pdfximage page 1 width 20mm {foo.pdf} crashes (teTeX-20021116): ! Missing { inserted. <to be read again> w \in #1->\hrule \noindent \pdfximage page 1 w idth 20mm {foo.pdf}\setbox ... Greetings Hartmut

Hartmut Henkel

8 Dec 8 Dec

11:26 a.m.

New subject: JBIG2 News

Hi pdfTeX fans, first excuse the long E-Mail. Here is some news on JBIG2 file inclusion with pdfTeX. The newest experimental pdfTeX driver I have put on my homepage. And also a PDF file with the datastream example from the JBIG2 standard. Status of Experimental JBIG2 Driver ----------------------------------- Multiple JBIG2 images from a given JBIG2 file can be selected, one per call only, e. g.: \pdfximage page 1 {foo.jb2} \pdfximage page 2 {foo.jb2} \pdfximage page 3 {foo.jb2} In this case the page 0 object is stored only once. The case of optimum JBIG2 compression would be to include ALL images from a given JBIG2 file. Giving page/width together does not work, see my other E-Mail. The newest xpdf 2.0 can nicely display and print the PDF-file generated from the full datastream example file (all three images!) from Annex H of the JBIG2 draft. Wow, how did they do it? But my Acrobat Reader ((R) by Adobe) x86 linux 5.0.5 Apr 25 2002 11:55:36 crashes already at page/image 2, saying just `Abgebrochen' :-( So most likely I have done something wrong. But where? Some Remarks on JBIG2 Multiple Image Inclusion in pdfTeX -------------------------------------------------------- Including multiple images from the same JBIG2 file gives some conceptual complications (similar problems might be known from PDF inclusion). How the JBIG2 file is organized is shown by an example (hope my understanding is right), where the digits denote segments for numbered pages, EOP is end-of-page flag, EOF is end-of-file flag. My understanding is, that a certain page N segment X requires all info from any page 0 segment up to the segment number X. A JBIG file might look like: 00010111111EOP(1)22222220022EOP(2)0333303EOP(3)EOF So if one wants to include the image from page 1, one needs all the page 0 segments up to EOP(1). If one wants page 2 also, one needs additional page 0 segments up to EOP(2). But already when writing the first page, the required page 0 info is to be written out as PDF object. The PDF definition seems to require, that all page 0 info is collected in ONE PDF stream. --- Or can one define a continuation stream in PDF? --- This would mean, that for writing an additional, later page (e. g. page 2), this page has to be accompanied also by its page 0 information---which is a waste of space in the PDF file, as part of the page 0 PDF info has been already written before. What to do? (1) Accompany any page N with all its page 0 information upto the EOP(N) flag. This is straight-forward, but it increases the size of the PDF file, as the same page information is included several times. So it makes part of the JBIG2 compression advantages void. (2) Scan once over the whole file and make one big page 0 object. Then reference this for any included page. This gives relative small files if multiple images from the same file are included. But this might give increased PDF file size, if only one image from the JBIG2 file is used. (3) Planning ahead (= knowing in advance), which images (e. g. up to which page) will be included from a given JBIG2 file. Then the included page 0 segment could be kept at a minimum. Just take everything up to the maximum image number. Or utilize a real JBIG2 decoder (XPDF) to cleverly decide which segments are really needed for the image subset. But the decoder would have almost nothing to decode (the UNdecoded JBIG2 stream is included), it only would have to decide... :-) How to remember that a page 0 is already written out for a given image? It seems that once an image object is written out, its img_name(img) and other structure info are forgotten. As a simple cludge, I remember it in a static (fixed-size, booo!) string storage within the write_jbig2() function. And _all_ page 0 information from the JBIG2 file is written, see case (2) above. Inclusion of multiple images is slow on the JBIG2 reading side, as the JBIG2 file for any new image is always scanned fresh from the beginning. There is no data structure optimizing this. If one would know beforehand, which images in total to include from a JBIG2 file, things could be optimized. If there were a TeX data structure available to the C-side, telling about pages (e. g. page 2,3-5,12), one could include only the actually required page 0 info. If the pages were sorted in ascending order, one could loop through the pages within the write_jbig2() function with high JBIG2 reading speed. All this is far for production use, obviously with errors, only for experimentation. I didn't look yet into the xpdf sources (sorry it's still too complicated for me). Anybody out there who can give me some hint about above question marks, and what to do next? Have fun! Greetings Hartmut ------------------------------------------------------------------------ Dr.-Ing. Hartmut Henkel In den Auwiesen 6, D-68723 Oftersheim, Germany E-Mail: hartmut_henkel@gmx.de http://www.circuitwizard.de ------------------------------------------------------------------------

Martin Schroeder

9 Dec 9 Dec

12:46 a.m.

New subject: JBIG2 News

On 2002-12-08 19:26:06 +0100, Hartmut Henkel wrote:

...

But already when writing the first page, the required page 0 info is to be written out as PDF object. The PDF definition seems to require, that all page 0 info is collected in ONE PDF stream.

--- Or can one define a continuation stream in PDF? ---

No. But each image can have its own JBIG2Globals: "The stream _can_ be shared by multiple image XObjects whose JBIG2 encodings use the same global segments." (emph mine)

...

This would mean, that for writing an additional, later page (e. g. page 2), this page has to be accompanied also by its page 0 information---which is a waste of space in the PDF file, as part of the page 0 PDF info has been already written before.

This seems to be the only way. :-{ The PDF Reference says nearly nothing on multi-image files.

...

What to do?

(1) Accompany any page N with all its page 0 information upto the EOP(N) flag. This is straight-forward, but it increases the size of the PDF file, as the same page information is included several times. So it makes part of the JBIG2 compression advantages void.

How large is a typical page 0 information compared to the actual images? [...]

...

(3) Planning ahead (= knowing in advance), which images (e. g. up to which page) will be included from a given JBIG2 file. Then the included page 0 segment could be kept at a minimum. Just take everything up to the maximum image number. Or utilize a real JBIG2 decoder (XPDF) to cleverly decide which segments are really needed for the image subset. But the decoder would have almost nothing to decode (the UNdecoded JBIG2 stream is included), it only would have to decide... :-)

This sounds like a two-pass approach, which we can not do. Best regards Martin -- Martin Schröder, MS@ArtCom-GmbH.DE ArtCom GmbH, Grazer Straße 8, D-28359 Bremen Voice +49 421 20419-44 / Fax +49 421 20419-10

Hartmut Henkel

2:35 a.m.

New subject: JBIG2 News

On Mon, 9 Dec 2002, Martin Schroeder wrote:

...

...
--- Or can one define a continuation stream in PDF? --- No.

Wouldn't something like /DecodeParms [<< /JBIG2Globals [6 0 R 7 0 R 8 0 R]>>] and then 6 0 obj << /Length 59 >> stream The stream part 1... (the dots make the length right :-) endstream 7 0 obj << /Length 97 >> stream The stream part 2... endstream 8 0 obj << /Length 42 >> stream The stream part 3... endstream work? (Which could be made recursive.) Never tried this.

...

But each image can have its own JBIG2Globals: "The stream _can_ be shared by multiple image XObjects whose JBIG2 encodings use the same global segments." (emph mine)

Yes, so it's done now; 3 pictures use 1 global. I would even emphasize _same_: Don't know whether my method of accumulated page 0 object is right. Now I read in the PDF ref. that when writing XObjects, their page number has to be reset to 1, but I haven't done this (it might be the reason for the Acroread crash, will test it). So the ordering of page association is taken out. Somehow a page object selects its page 0 objects not by any ordering, but by what it wants to do with it. I mean, if I include ALL pages 0, even the ones which came with later pages, e. g. the 1st page would not stumble over the later ones.

...

...
This would mean, that for writing an additional, later page (e. g. page 2), this page has to be accompanied also by its page 0 information---which is a waste of space in the PDF file, as part of the page 0 PDF info has been already written before.

This seems to be the only way. :-{

Then the conceptual beauty of the multi-page JBIG2 images would be spoilt.

...

How large is a typical page 0 information compared to the actual images?

There is exactly ONE multi-page JBIG2 file I have (got it from William Rucklidge, it's also in the JBIG2 standard). So it's ideal for statistics :-) Overall page 0 is 68 byte there, page 0 for page 1 is only 35 bytes. The page 1-3 streams have length 341, 271, 123. In THIS example there is no real advantage of having one page 0 for all pages. But who knows. They are always stating, that JBIG2 is advantageous particularily for multi-pages. (And that's where pdfTeX could do good work, e. g. take some good old fiction or math books without copyright, scan them, compress them with JBIG2, republish them online through pdfTeX.)

...

This sounds like a two-pass approach, which we can not do.

I don't know what pdfTeX can request on its interface, e. g. Gimme picture 1, pause, gimme picture 2, or can it give a data structure: Picture 1 AND 2 please? E. g. when one writes page 1-3. Greetings Hartmut

Martin Schroeder

2:52 a.m.

New subject: JBIG2 News

On 2002-12-09 10:35:10 +0100, Hartmut Henkel wrote:

...

On Mon, 9 Dec 2002, Martin Schroeder wrote:

...
...
--- Or can one define a continuation stream in PDF? --- No.

Wouldn't something like

/DecodeParms [<< /JBIG2Globals [6 0 R 7 0 R 8 0 R]>>]

work? (Which could be made recursive.) Never tried this.

Most likely not. Table 3.10 says stream, not array. :-{

...

...
This seems to be the only way. :-{

Then the conceptual beauty of the multi-page JBIG2 images would be spoilt.

I think it would be best to discuss this with some Adobe experts (we seriously need a contact there). This also in the hope that the standard might be changed/amended. [...]

...

...
This sounds like a two-pass approach, which we can not do.

I don't know what pdfTeX can request on its interface, e. g. Gimme picture 1, pause, gimme picture 2, or can it give a data structure: Picture 1 AND 2 please? E. g. when one writes page 1-3.

This should be possible. The interface (which one?) can be changed. :-) Best regards Martin -- Martin Schröder, MS@ArtCom-GmbH.DE ArtCom GmbH, Grazer Straße 8, D-28359 Bremen Voice +49 421 20419-44 / Fax +49 421 20419-10

Hartmut Henkel

4:47 a.m.

New subject: JBIG2 News

Now also Acroread can read all three JBIG2 test data stream pages. My fault: The page numbers were not set to 1 before writing the XObjects (XPDF seems to graciously ignore these page numbers). Greetings Hartmut

Martin Schroeder

12:31 a.m.

New subject: \pdfximage page/width exclusive

On 2002-12-07 22:03:14 +0100, Hartmut Henkel wrote:

...

\pdfximage page 1 {foo.pdf} or \pdfximage width 20mm {foo.pdf} works, but \pdfximage page 1 width 20mm {foo.pdf} crashes (teTeX-20021116):

! Missing { inserted. <to be read again> w \in #1->\hrule \noindent \pdfximage page 1 w idth 20mm {foo.pdf}\setbox ...

<quote src="pdftex-syntaxt.txt"> \pdfximage [<image attr spec>] <general text> (h, v, m) <image attr spec> --> [<rule spec>] [<attr spec>] [<page spec>] [<pdf box spec>] <rule spec> --> width <dimen> [<rule spec>] <rule spec> --> height <dimen> [<rule spec>] <rule spec> --> depth <dimen> [<rule spec>] <attr spec> --> attr <general text> <page spec> --> page <number> </quote> So yes, \pdfximage page 1 width 20mm {foo.pdf} is illegal; it must be \pdfximage width 20mm page 1 {foo.pdf} Best regards Martin PS: The syntax is a correct here; you can say "width 5mm width 6mm". :-) -- Martin Schröder, MS@ArtCom-GmbH.DE ArtCom GmbH, Grazer Straße 8, D-28359 Bremen Voice +49 421 20419-44 / Fax +49 421 20419-10

Hartmut Henkel

6:08 a.m.

New subject: \pdfximage page/width exclusive

The "width 5mm width 6mm" is totally correct by definition, due to TeX's correctness. E. g. you can also write \hbox width 10pt width 20pt. It's 20 pt long, you don't need to check :-) But I couldn't find a reference in the TeXbook about this feature and what the precendence is. Greetings Hartmut On Mon, 9 Dec 2002, Martin Schroeder wrote:

...

PS: The syntax is a correct here; you can say "width 5mm width 6mm". :-) -- Martin Schröder, MS@ArtCom-GmbH.DE

------------------------------------------------------------------------ Dr.-Ing. Hartmut Henkel In den Auwiesen 6, D-68723 Oftersheim, Germany E-Mail: hartmut_henkel@gmx.de http://www.circuitwizard.de ------------------------------------------------------------------------

Hartmut Henkel

6:13 a.m.

New subject: \pdfximage page/width exclusive

I meant \hrule width 10pt width 20pt, sorry. On Mon, 9 Dec 2002, Hartmut Henkel wrote:

...

The "width 5mm width 6mm" is totally correct by definition, due to TeX's correctness. E. g. you can also write \hbox width 10pt width 20pt.

Hartmut Henkel

6:31 a.m.

New subject: \pdfximage page/width exclusive

Found it: TeXbook, ch. 21, p. 221: ``If you specify a dimension twice, the second specification overrules the first.'' Greetings Hartmut

Heiko Oberdiek

15 Jan 15 Jan

9:32 a.m.

New subject: [pdftex] AR5 "Fit visible" bug

On Tue, Dec 03, 2002 at 11:02:07AM +0100, Martin Schroeder wrote:

...

On 2002-05-26 02:02:48 +0200, Heiko Oberdiek wrote:

...
look at the following simple LaTeX file:

["Fit Visible" bug description snipped.]

What became of it? It certainly isn't in the source yet.

I have not found it in the latest teTeX-beta of 2003/01/12. Because it is very annoying to try reading documents with "Fit visible" and AR5/Unix, the patch should be included. Yours sincerely Heiko

8089

Age (days ago)

8132

Last active (days ago)

List overview

Download

11 comments

3 participants

participants (3)

Hartmut Henkel
Heiko Oberdiek
Martin Schroeder