On 12/18/2017 12:40 PM, Pali Rohár wrote:
On Monday 18 December 2017 11:17:45 Hans Hagen wrote:
It looks like that currently pdftex generates CMap from glyph names. Theoretically it should be possible to assign fully unique glyph names for every one glyph, possible fully random and then into CMap table put correct mapping for all character codes (as CMap table does not use glyph names) according to enc file.
that would confuse some viewers too (i remember some thread about non standard ffi ligature names and resolving hard coded in some viewer and the request for tex related fonts to conform to that bad practice too)
First occurrence of duplicate can use originally specified glyph name and second, third, ... occurrences can use newly unique glyph name (with proper CMap table). Yes, that would not fix problem for those "some" viewers but in this situation it is better then nothing.
Two 'same' names in an enc file not referring to the same glyph is a bugged enc file. Personally I would not use such a font.
File test.tex: ============ \pdfglyphtounicode{mychar}{269} \pdfgentounicode=1 \pdfmapline{cmb10
And result PDF file would not render glyph 'a' if function remove_duplicate_glyph_names() is disabled. There would be two glyphs 'b'. but still i think that the fact that there are duplicate names in my.enc file is the real problem: if two b's refer to different shapes then what is the real 'b'? And what is the right new name: b.one, b.two ?
If you have two shapes for b, then you can assign glyph name 'b' only just for one shape in final PDF. What you can do is to create CMap table where both characters would be mapped to unicode code point for 'b'.
in that case the enc file should have dollar and dollar.oldstyle or b and b.smallcaps i.e. a proper name, not something arbitrary
PDF viewers which do not use CMap would not be able to copy+paste properly. But this is current situation as /ToUnicode is not supported for Type3 fonts yet.
if one follows the adobe glyph name convention it should work ok (at least in acrobat, mupdf)
Anyway, exactly same problem is for Type 1 fonts. If you have two different shapes for b in Type 1 font, then only one can have glyph name 'b'.
i've never seen a type 1 font with two 'same names' for different shapes ... it would qualify as 'a font to avoid'
What does one expect with cut and paste?
The expected behavior for ordinary user is simple: Both glyphs which are marked as 'b' should be copied as character 'b'.
It can work only in PDF viewers with correct CMap support. But with current pdftex code it is not possible.
viewers can yuse the names instead
But you are right that this is a real problem. Some calligraphic fonts have more glyphs for one character. And decision which glyph needs to be used is based on previous or next characters.
then there's something a.varianta, a.variantb, a.variantc and a cut and paste will use the 'a' part to identity the name, just like f_f_i is a convention for a ligature
If two names are the same and they refer to the same font program then there is no problem and the first one encountered when embedding should be used.
If remove duplicates is an option in pdftex then at least make sure that it's off by default (better complain loudly on the console that the enc is broken)
Do you want to be this problem a fatal error?
Fatal in the sense that a viewer crashes? Sure. Then at least I know that the 'b' in a font is probably not a 'b'. Also, in that case it's a signal to avoid that font. (The same can be true for embedding fonts with bad font names that clash.) FYI: I decided (in context with luatex at least) to *not* use the fontloader but write one on lua that stays close to the original font and avoids the usual heuristics ... it's hard to fight (bad or fuzzy) heuristics as they obscure problems.
so that the user knows that enabling that option is not solving the problem (and in tex distributions the fixed enc should be used). Heuristics and fixes for bugged fonts are nice but not being able to bypass them is bad.
I thought it would be better to produce PDF file as enc file itself does not change how PDF file is rendered. It affects only copy+paste from PDF file.
But why not fix the enc file?
(multiple .notdef is an exception)
Different, but maybe more interesting question is: What happens for other font formats if supplied enc file contains duplicate names? I can only speak for luatex: we don't use enc files for type 1 and opentype. And even for type 3 (which i never use) I'd avoid them. In fact, everything related to encodings is already dealt with when the font is defined (loaded), and an afm or pfb file is normally ok. Makes me wonder how these bad enc files can show up at all, as those type 3 fonts are very old school and therefore the problem of duplicate names for different shaped should also have been seen with dvips and so.
Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------