Hi Hans, and others
On 26 Jan 2019, at 8:09 pm, Hans Hagen
PDF/A-2 and PDF/A-3 relax many of those 'may not include’s, which are mostly things that TeX does support. The optionality of /CharSet is just another such relaxation.
just wondering: do you see any technical advantage in this CharSet bit array, other than it being an option to predict maybe font memory allocation demands or so (which then in turn is useless as the pdf format has many aspects that will bloat memory usage anyway) I can envisage a possible use for having this knowledge of which glyphs are available internally in a font subset. PDFs are now editable, at least in Acrobat Pro. So knowing what characters are available lets software easily determine whether a simple edit that changes or adds characters to a text block can simply be performed using the embedded font subset, or whether a font substitution is needed to do the specific edit. Of course it is preferable to not have to substitute, as this can change the metrics, hence potentially making a noticeable change to the visual appearance of that text block. If you have ever tried to edit a PDF made by someone else (with TeX or Word or …) then you should have experienced how things can move around significantly within the same paragraph.
Anyway, right now the choices are a) omit /CharSet or b) output a possibly-incorrect CharSet.
If there was a primitive that can control this, then that would potentially be enough, at least for the present. It would allow the CharSet to be omitted with PDF/A-2,3 but included with PDF/A-1.
in luatex it's an option At what level? Can it be done on a font-by-font basis? That would be ideal. If just a command-line option when calling lualatex then that is kind of workable. Essentially it would require a user to have done a preflight check and found that one of the fonts has a CharSet problem. Then rerun with the option set, to get a valid PDF/A-2 (or 3) document. It would be affecting all the Type-1 fonts, not just one of them. The ability (described above) to later edit the PDF would be lost pretty much entirely.
This distinction would need to be documented (in pdfx.pdf say ) so that authors can understand the issue and choose the appropriate package-loading option for their own circumstances. I’m happy to do this.
But I’ve not yet looked at how the subsetted font is constructed. My thought is that the latter needs to adjust the gl_tree before it is used. As I said previously, this will be a timing issue; so I’m not confident that I could correctly write the necessary coding, using programming structures that I don’t fully understand.
i don't know about pdftex but it is something delayed to the last when the 'combined' font resource is added as different tex fonts using the same resource can get different entries (and width arrays) but share the blobs My understanding of the code in writefont.c is that the Font Descriptor dictionary is constructed (and written) as a complete object, before the font subset itself is constructed. For the CharSet, the entries in gl_tree are used, based upon a list of the characters explicitly using that font. This does *not* include implicit glyphs, such as /grave (and perhaps /a ) with /agrave . It was such a circumstance that initiated this conversation roughly a year ago. I looked at solutions like writing the accent characters in white, outside the page boundaries, as an /Artifact say. But this begets a range of difficulties, and could potentially affect the pagination or typesetting, and can fail other accessibility checks. I want to develop reliable means to construct documents simultaneously for both Archivability and Accessibility.
(I highly doubt that Thanh has time to look into this.) Sorry, but that's the reality. -k it's probably not that complex; i also doubt if the quality of that vector should be perfect as probably only its prensence is checked, not its internal validity (which then would also demand checking fonts which afaik doesn't happen in detail); and i bet that viewers ignore its content anyway
From the veraPDF link that Reinhard provided, it seems that presence is checked with PDF/A-1, but not accuracy. But for PDF/A-2 and 3, there is an more detailed check for accuracy. Perhaps true for viewers; but PDFs are becoming about *more* than just the visual view. We want to be providing the structures required for accurate text extraction and editing. TeX was never designed with this in mind, but because of its programmability this is something that should be achievable. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nlhttps://protect-au.mimecast.com/s/i9o5CgZ05JfPE8zwCEkFIz?domain=pragma-ade.n... | www.pragma-pod.nlhttps://protect-au.mimecast.com/s/EEOhCjZ12RflogxGHnU_YL?domain=pragma-pod.n... ----------------------------------------------------------------- Cheers. Ross