Arthur Reutenauer
He means "ActualText tags" :-) See the PDF spec section 14.9.4, page 623. It's a more generic way to support searching than ToUnicode vectors: you just specify the actual string of underlying Unicode characters. The PDF spec uses hyphenated "ck" in German as an example: you typeset "Druk-ker" but you want to search for "Drucker". You can't do that with ToUnicode vectors.
You also need ActualText tags to mark the difference between a discretionary hyphen and an explicit hyphen in English, which programs like Reader use when extracting text. When the hyphen is discretionary you set the ActualText to Unicode AD instead of 2D. (That's mentioned somewhere in the PDF spec.) Another thing I just thought of that isn't always done is that there should be explicit space characters between words, including at the ends of lines, although I'm not sure whether Adobe Reader turns off its word-boundary heuristics if it sees space characters. Since what I enjoy doing is making e-books that can be searched through and, perhaps more importantly, extracted from via the Select tool, it's important to me to make the search, selection, and extraction features work. I'll use them myself if I choose, for instance, to quote from an e-book I made. I've added them in my (heavily) modified version of ant, but that's in a primitive state, a long-term project that competes with font-making and e-book-making for time, and so I'd like to have ConTeXt as well. I like ConTeXt a lot. Also, I noticed when playing around with the examples from the "Th" ligature discussion that searching and extraction didn't work with small caps, though it did work with the ligature. With ActualText tags these things always work, regardless of the ToUnicode map's contents. The way Cairo's PDF backend handles this is to use an ActualText tag for any glyphs that aren't included in the font's encoding. What I did in my modified ant is to generate a ToUnicode map from the Adobe glyph naming convention (http://www.adobe.com/devnet/opentype/archives/glyph.html) and then put an ActualText tag on anything that happens not to match what you would get from the ToUnicode mapping. (For reasons that were stupid, I once created a lame little C library to do the mapping from glyph names to Unicode, using a compressed lookup trie: http://code.google.com/p/kompostilo/source/browse/#svn/trunk/support-librari... )