Patches item #580, was opened at 2006-07-14 20:57 You can respond by visiting: http://sarovar.org/tracker/?func=detail&atid=495&aid=580&group_id=106 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: The Thanh Han (hanthethanh) Assigned to: Nobody (None) Summary: Patch to make ToUnicode for Type1 fonts Initial Comment: This is a patch to pdftex so that it can create ToUnicode entries for Type1 fonts. The main purpose is to make ligatures and some other glyphs like smallcap letters or oldstyle digits from OpenType fonts searchable. This patch also contains a minor fix that allows use of fonts without embedding, for example MinionPro or MyriadPro (which are distributed with Acrobat Reader >= 7.0 but from their use is restricted with Acrobat Reader only). How to apply: ~~~~~~~~~~~~~ - this patch applies to the pristine pdftex-1.40.0-beta-20060213 sources only; if you have applied another patch(es) to the sources, please discard them and start from the fresh ones. - how to apply: ,-------- | cd /path/to/pdftex-1.40.0-beta-20060213/src | cat /path/to/the/patch | patch -p1 | ./configure | cd texk/web2c | make pdfetex `-------- If you want to be careful, try the patch with the option '--dry-run' first to see whether the patch can be applied without problems. Usage: ~~~~~~ add the following lines into your document, somewhere at the beginning: ,-------- | \input glyphtounicode.tex | \pdfgentounicode=1 `-------- Customization: ~~~~~~~~~~~~~ If pdftex cannot generate the right ToUnicode value for some glyphs (probably because the glyph name is not ``known'' to pdftex), it's possible to add further entries so pdftex can learn how to generate unicode for such ``unknown'' glyphs. The syntax is simple: \pdfglyphtounicode{<glyph-name>}{<unicode-value>} Example: \pdfglyphtounicode{A}{0041} says that glyph 'A' has its unicode U+0041 The entries in glyphtounicode.tex cover Adobe Glyph List (glyphlist.txt version 2.0) and some addtional glyphs (texglyphlist.txt version 2.33, coming from from lcdf-typetools), plus some additional entries for ligatures. If some glyph name cannot be found, pdftex does some simple name translations: - remove any ".xxx" suffix from glyph name, where "xxx" is a string consisting of alphabetic characters. For example "A.sc" => "A" - remove suffix like "small", "oldstyle", "inferior" and "superior" from glyph name. For example "Asmall" => "A" The result name then is looked up again to find a unicode. Ligatures require a special form of ToUnicode. Example: \pdfglyphtounicode{ff}{00660066} here '0066' is the unicode string for 'f'. Some ligatures have their name like 'f_f_i', in such case the command should be \pdfglyphtounicode{f_f_i}{006600660069} ie '_' is removed from the glyph name, and then all letters are translated to their unicode string. ----------------------------------------------------------------------
Comment By: The Thanh Han (hanthethanh) Date: 2006-10-07 09:44
Message: Logged In: YES user_id=710 Patch updated to fix a bug reported by Akira. ---------------------------------------------------------------------- Comment By: The Thanh Han (hanthethanh) Date: 2006-10-06 11:34 Message: Logged In: YES user_id=710 This is a patch to pdftex that includes: - a fix for bug #611 - some changes to ToUnicode support (patch #580) to make the implementation follow guidelines at http://partners.adobe.com/public/developer/opentype/index_glyph.html How to apply: ~~~~~~~~~~~~~ - this patch applies to the pristine pdftex-1.40.0-beta-20060928 sources only; if you have applied another patch(es) to the sources, please discard them and start from the fresh ones. - how to apply: ,-------- | cd /path/to/pdftex-1.40.0-beta-20060928/src | cat /path/to/the/patch | patch -p1 | ./configure | cd texk/web2c | make pdftex `-------- If you want to be careful, try the patch with the option '--dry-run' first to see whether the patch can be applied without problems. Usage: ~~~~~~ add the following lines into your document, somewhere at the beginning: ,-------- | \input glyphtounicode.tex | \pdfgentounicode=1 `-------- Customization: ~~~~~~~~~~~~~ If pdftex cannot generate the right ToUnicode value for some glyphs (probably because the glyph name is not ``known'' to pdftex), it's possible to add further entries so pdftex can learn how to generate unicode for such ``unknown'' glyphs. The syntax is simple: ,-------- | \pdfglyphtounicode{<glyph-name>}{<unicode-value>} `-------- Example: ,-------- | \pdfglyphtounicode{A}{0041} `-------- says that glyph 'A' has its unicode U+0041. \pdfglyphtounicode requires that the second parameter consists of uppercase hexadecimal digits (0..9, A..F) and spaces. If this is not the case the entry is simply discarded (with a warning). Later entries overwrite previous entries with the same name (1st arg). The entries in glyphtounicode.tex cover: - glyphlist.txt (Adobe Glyph List v2.0) - zapfdingbats.txt (ITC Zapf Dingbats Glyph List v2.0) - texglyphlist.txt (lcdf-typetools texglyphlist.txt v2.33) - additional.tex (additional entries) Ligatures require a special form of ToUnicode. Example: ,-------- | \pdfglyphtounicode{ff}{00660066} `-------- here '0066' is the unicode string for 'f'. Spaces are ignored in the second parameter of \pdfglyphtounicode, hence it is possible to write the above command as ,-------- | \pdfglyphtounicode{ff}{0066 0066} `-------- which is easier to read and understand IMO. ---------------------------------------------------------------------- Comment By: The Thanh Han (hanthethanh) Date: 2006-07-19 06:08 Message: Logged In: YES user_id=710 patch updated to fix a bug reported by Dohyun Kim ---------------------------------------------------------------------- Comment By: The Thanh Han (hanthethanh) Date: 2006-07-16 11:19 Message: Logged In: YES user_id=710 patch updated by a bug fix from Akira ---------------------------------------------------------------------- You can respond by visiting: http://sarovar.org/tracker/?func=detail&atid=495&aid=580&group_id=106