[pdftex-Bugs][4321] Illegal entry in bfrange block in ToUnicode CMap
Bugs item #4321, was opened at 2010-11-25 15:21 Status: Open Priority: 3 Submitted By: Heiko Oberdiek (oberdiek) Assigned to: Nobody (None) Summary: Illegal entry in bfrange block in ToUnicode CMap Category: None Group: None Resolution: Accepted Initial Comment: Hello, pdfTeX complains Error: Illegal entry in bfrange block in ToUnicode CMap for valid cmap entries, when a PDF file is included. The CMap entries are, for example: 1 beginbfrange <0041><0041><0041> endbfrange The error disappears in case of 1 beginbfrange <41><41><0041> endbfrange The error is in function CharCodeToUnicode::parseCMap1 in file libs/xpdf-3.02/xpdf/CharCodeToUnicode.cc In case of poppler the problem is already reported with patch: http://lists.freedesktop.org/archives/poppler-bugs/2010-April/004931.html The appended test file can be processed by "pdftex --ini", "pdftex" or "pdflatex". Yours sincerely Heiko ----------------------------------------------------------------------
Comment By: Taco Hoekwater (taco) Date: 2010-11-26 08:19
Message: ToUnicode is a little odd because it uses CMap syntax with a few extra limitations that are only in the pdf reference, and these seem to come from a really weird bit of Acroread implementation code. I have not looked at the input closely, so I could be missing the point a little, but this could be the problem: The hex number scanning in AR is closely related to the begincodespacerange ... endcodespacerange block. If the code space range is one byte, then all hex numbers have to be specified in two digits, and if the code space range is two bytes, then all further hex numbers have to be given in four digits. ---------------------------------------------------------------------- Comment By: The Thanh Han (hanthethanh) Date: 2010-11-26 03:32 Message: we would apply the mentioned patch from poppler. Regards the case <0041><0041><0042> it works fine with Preview (osx) and acrobat 9, so I think it's a browser issue. Thanh ---------------------------------------------------------------------- Comment By: Heiko Oberdiek (oberdiek) Date: 2010-11-25 15:48 Message: Hello, I have made further experiments by replacing the last <0041> by <0042>. The "A" of the input file should then get converted to "B" by copy&paste. This works for the line <41><41><0042> with AR7/Linux, however it fails ("A" instead of "B") in case of <0041><0041><0042> The PDF specification shows in section "5.9 Extraction of Text Content" entries with four hexadecimal digits. Can someone bring some light to this obscurity? Yours sincerely Heiko ---------------------------------------------------------------------- You can respond by visiting: http://sarovar.org/tracker/?func=detail&atid=493&aid=4321&group_id=106
participants (1)
-
pdftex-bugs@sarovar.org