On 10/4/2018 7:17 AM, Ross Moore wrote:
Hi Hans,
On 4 Oct 2018, at 4:01 am, Hans Hagen
mailto:j.hagen@xs4all.nl> wrote: It is important to get this fixed, else many otherwise perfectly fine documents will fail Preflight validation checks, for modern PDF standards.
one can argue that the validator is wrong ... the subsetted basefont that is used is the same but it's accessed with different encodings which refer to the same subsetted font
It’s not a validator so much as specialised checking for things that might be issues with some software. See the image for the tests being run.
I’m mostly interested in the “mapped to Unicode” check, and have found hundreds of instances from various TeX fonts where this has not been so. In particular, I have a new file glyphtounicode-ntx.tex which contains mappings for glyphs used in the newtx fonts into unicode equivalents, often as sequences rather than a single code point. Indeed your example below has helped identify 6 more entries for this file.
hm, i can send you a list of all glyph names of the fonts on my system, a couple of thousand, (if needed i can do that for tex live in a vm)
Most of those checks are not done with PDF/A or PDF/UA. However, that doesn’t preclude them from future standards.
Having consulted the ISO 32000 (PDF 1.7) standard, the relevant sections are • §9.6.4 (Font Subsets) where it says: … different subsets in the same PDF file shall have different tags. and • §9.6.2 Table 111 – Entries in a Type 1 font dictionary BaseFont "For Type 1 fonts, this is always the value of the FontName entry in the font program;”
This condition on BaseFont fixes the name to match what is used in the actual subset font. It seems that TeX is building further subsets of an already subsetted font. The names for these are then set by the requirement to match the common BaseFont name. But that then violates the requirement for different tags with different subsets; this is what the test is reporting.
it depends on how a subset is defined ... i think a subset is the font descriptor object which has a subset of glyphs ... the other one is an instance with for instance different widths << /Type /Font /Subtype /Type1 /BaseFont /AXUCRW+TeXGyrePagella-Regular /FontDescriptor 37 0 R /FirstChar 0 /LastChar 255 /Widths 27 0 R /Encoding 26 0 R >> imagine that the requirement would be a different name, then one also needs a complete extra subset of shapes for each such instance as the fontname needs to match basefont
So it seems that this further subsetting of an already subset font is a rather suspect action…
no, it's not further subsetting: [instance with set of glyphs + widths] => font resource which is subset of pfb and access is by glyph name where the subset of pfb cna have more than 256 characters (as many fonts have)
… though I can’t see much alternative, when you want Type 1 subsets of a font requiring more than 256 characters to be supported, in all. Maybe we shouldn’t be doing that? What would this mean for other parts of TeX?
It's mostly backend: if you really want that it means that *each* font instance also has a full set of shapes which means that, when you use fonts with different width specs (as i think hz in pdftex does: creating new font instance) or even when one uses a virtual font referring to some other one but with different widths, you get pretty large files. Then I assume that distiller would also do that for such cases.
Since neither PDF/A nor PDF/UA actually check for this, there is no real problem yet. There are no explicit tests for PDF 1.7 compliance. It’s not clear whether this will become an issue with future standards; e.g., with PDF 2.0. Time will tell. Yet it’s probably worth thinking about, in advance.
indeed, let's not worry about it now (personally i don't worry anyway as (1) we never run into validation issues, only occasionally in validators messing up files, and (2) we always use wide fonts)
a larger test:
Actually I encountered the issue with complete font-tables for the fonts in newtx (both newtxmath and newtxtext ). Karl requested a smaller example so that is what I provided.
\font\fonti =fxlri-7alt \font\fontii=fxlri-7letters
{\fonti \char"04}\par {\fontii A B C}\par
\count99=0 \def\foo{ \char\count99 \advance\count99 1 \ifnum\count99<256 \expandafter\foo \fi}
{\font\fonta=texnansi-qplr \fonta \foo}\par {\font\fonta=ec-qplr \fonta \foo}\par {\font\fonta=qx-qplr \fonta \foo}\par {\font\fonta=t5-qplr \fonta \foo}\par {\font\fonta=qx-qplr-sc \fonta \foo}\par
\bye
This tests a lot of glyphs, thanks. It helped me find some new glyph names without a known map to Unicode.
these fonts are the same as the opentype so also the same glyph names
this will still use one font resource with > 256 characters (and a bit bogus encoding in the pfb blob which actually can be StandardEncoding)
Surely this is true only for a subset of the font. Assistive Technology needs to extract proper characters (unless over-ridden by alternative text), so it would not be appropriate to declare StandardEncoding.
so it's a rather old situation i think ... and changing that will demand big changes (so not something done quick as one never knows what side effects occur) e.g. one font resource per subset (so larger files too)
Yes; I think that is what may need to be done, if a format or standard requires it. Just not yet.
(the same in luatex but there one will normally use wide fonts which don't have that problem and there are subtlke differences between the engines)
The TeXGyre fonts seem to have 500+ glyphs. It may be that we are creating the problem for ourselves by using Type 1 encodings that are subsets limited to at most 256. This approach seems to work now, but is it reliable into the future?
(acrobat X - i have no newer version - preflights ok)
Many more tests have been added in the past few years. The interface changed significantly too, about the time when tests for PDF/UA were added.
I find it to be an extremely powerful tool, which is very easy to use. Given there’s no Open Source equivalent, I must say that it alone is well worth the licensing fee for Acrobat Pro.
Hans
Cheers.
Ross
* Dr Ross Moore*
*Mathematics Dept **|* 12 Wally’s Walk, 734 Macquarie University, NSW 2109, Australia
*T:* +61 2 9850 *8955 | F:* +61 2 9850 8114 tel:%2B61%202%209850%209695 *M:*+61 407 288 255 tel:%2B61%20409%20125%20670* | *E: ross.moore@mq.edu.aumailto:rick.minter@mq.edu.au
http://www.maths.mq.edu.au http://mq.edu.au/
CRICOS Provider Number 00002J. Think before you print. Please consider the environment before printing this email.http://mq.edu.au/
This message is intended for the addressee named and may contain confidential information. If you are not the intended recipient, please delete it and notify the sender. Views expressed in this message are those of the individual sender, and are not necessarily the views of Macquarie University. http://mq.edu.au/
-- ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------