Feature Requests item #429, was opened at 2005-09-11 22:37
Status: Closed Priority: 2 Submitted By: Timothy O'Brien (oberon101) Assigned to: Martin Schröder (oneiros) Summary: Generate Tagged PDF Category: None Group: None Resolution: None
Initial Comment: Adobe Reader has a'reflow' feature that allows visually impaired users to zoom into properly formatted documents and read them without having to move the viewable area back and forth across the page. PDFs made in MikteX eith pdfte are not properly formatted and reflow without interword spacing, redering them unreadable. I contacted the MikteX people and they referred me here. I beleive Scientific Word also uses pdftex with the same result. Any chance this could be fixed? ----------------------------------------------------------------------
Comment By: The Thanh Han (hanthethanh) Date: 2010-12-19 04:12
Message: see svn branch tagged-pdf at supelec ---------------------------------------------------------------------- Comment By: The Thanh Han (hanthethanh) Date: 2008-04-27 10:22 Message: here is an attempt: http://sarovar.org/tracker/index.php?func=detail&aid=945&group_id=106&atid=495 ---------------------------------------------------------------------- Comment By: Nobody (None) Date: 2006-05-16 11:49 Message: Logged In: NO " The main problem with reflowing is not the missing tags but that pdftex writes interword spaces as a kern (since there is no "space" in TeX, of course). " No, the main problem is that Adobe Reader dropped recognition of "words" based on *spacing* and resorted to the the simplistic approach to use *spaces* instead. This is simply ignorance of Adobe about TeX. Instead of whining about this one can do something for a practical cure: add dummy spaces at the end of every word of zero width which consist of a *space* and a kern -\wd(space). space could come from any font producing explicit spaces in the output (e.g. cmtt). It is most easily done with VF's. As far as I remember a real implementation is already there in Vtex (with some option). " non-trivial: - first pdfTeX would have to be extended with primites for a structure tree (and classes and packages would have to use these primitives) - then primitives for tagging the content are needed and must be used " i) when generating PS-code and distilling via Adobe Distiller there should be no problem to take care of pdfmarks created by classes & packages... ii) I consider the major problem is page building and page dependency of marked contents. It is just like the difficulty to get consistent color in TeX: one can consider color as sort of a "tag". E.g. a marked TeX paragraph is broken accross pages with (tagged) headers and the structure should in general kept linked also under the condition of reordered pages. I suggest first to define what functionality is required at the side of the PDF-Reader (text extraction, save as XML or audio output via screen reader). The packages should then be able to save Latex macro structure as pdfmarks for pdf tags as a prerequisite. Support of pagebreaking of tagged objects will require some assistance from pdftex like linebreaking of weblinks. HS ---------------------------------------------------------------------- Comment By: Robert (schlcht) Date: 2006-05-06 15:20 Message: Logged In: YES user_id=2217 The main problem with reflowing is not the missing tags but that pdftex writes interword spaces as a kern (since there is no "space" in TeX, of course). A rather simple but effective way would be to write the interword spaces in a different font (e.g. non-embedded Times-Roman), and then compensate for the difference between Times's width of space and the width of the glue calculated by TeX. (At least, this is what Distiller does, if you select "Advanced -> Accessibility -> Add Tags to Document".) So that (This)-419(is)-420(an)-419(example) will be turned into: /T1_0 1 Tf (This)Tj /T1_1 1 Tf ( )Tj /T1_0 1 Tf 2.369 0 Td (is)Tj /T1_1 1 Tf ( )Tj /T1_0 1 Tf 1.092 0 Td (an)Tj /T1_1 1 Tf ( )Tj /T1_0 1 Tf 1.475 0 Td (example)Tj where T1_0 is cmr10 and T1_1 is Times-Roman. This would be already a major enhancement with respect to accessibility without any packages being required. ---------------------------------------------------------------------- Comment By: Nobody (None) Date: 2006-04-14 11:33 Message: Logged In: NO Maybe a first version can use a very low-level solution, just with a single tagging primitive; there is such a command already (I think it is called pdfliteral) . And the tree can come later. So one could start with little work, assuming one knows tagging. CS ---------------------------------------------------------------------- Comment By: Martin Schröder (oneiros) Date: 2006-04-14 11:12 Message: Logged In: YES user_id=421 I'm changing the summary. Yes, we are aware that tagged pdf is an often requested feature, but implementing it would be non-trivial: - first pdfTeX would have to be extended with primites for a structure tree (and classes and packages would have to use these primitives) - then primitives for tagging the content are needed and must be used ---------------------------------------------------------------------- Comment By: Nobody (None) Date: 2005-11-04 19:58 Message: Logged In: NO I would volonteer to test the feature. I am writing a rather long pdf produced with pdftex that is downloadable for free ( http://www.motionmountain.net ) and readers regularly ask why it cannot be read aloud. Pdftex probably would only need to be extended with a single command - something like \writetaghere{tagtype} - and all the rest could be done by extensions to the latex cls and sty files. CS ---------------------------------------------------------------------- You can respond by visiting: http://sarovar.org/tracker/?func=detail&atid=496&aid=429&group_id=106