On 2005-06-25 03:52:01 +0200, Heiko Oberdiek wrote:
\pdfmatch [icase] [subcount <number>}] {<pattern>}{<string>} Implements pattern matching using the POSIX regex (a standard library at least in my linux). It returns the same values as \pdfstrcmp, but with the following semantics: -1: error case (invalid pattern, ...) 0: no match 1: match found Options: * icase: case insensitive matching * subcount: it sets the table size for found subpatterns. A number "-1" resets the table size to the start default.
See the manual page regex.3 and regex.7.
The implementation shows a possible interface to pattern matching in TeX. Therefore only the basics is implemented. Flags: * REG_EXTENDED is set in the implementation. * REG_ICASE: can be set by user. * other: not implemented.
\pdflastmatch <number> The result of \pdfmatch is stored in an array. The entry "0" contains the match, the following entries submatches. The positions of the matches are also available. They are encoded in the following manner to avoid another primitive: <position> "->" <match string> "->" is used as separator in the tradition of \meaning. There exists macros for parsing the output of \meaning (e.g. in LaTeX: \strip@prefix). The position "-1" with an empty string indicates that this entry is not set. Example: \def\msg#{\immediate\write16 } \msg{\pdfmatch{(l+)o (W(o))}{Hello World}} \msg{\pdflastmatch0} \msg{\pdflastmatch1} \msg{\pdflastmatch2} \msg{\pdflastmatch3} \msg{\pdflastmatch4} Result: 1 2->llo Wo 2->ll 6->Wo 7->o -1->
Alternative: PCRE (Perl-compatible regular expressions) is far more powerful. More options, named subpattern, ... License for 0.4 was GPL compatible, since 0.5 it is BSD, current version is 0.6.
The TeX interface could be changed in the following way: * Addition: \pdflastmatchbyname <general text> It extracts matches for named subpattern. * Options can be given by the same name as in the PCRE description: \pdfmatch anchored caseless ... {}{} For easier/faster scanning the options could be restricted to be given in sorted order. * Or options can be given by letters in any order in an additional argument: \pdfmatch{<pattern>}{<options>}{<string>} \pdfmatch{l+}{ai}{Hello World} The implementation could then use strchr to check, whether an option is set.
Patch instructions for testing are given in the patch description at sarovar.
While this is a VERY nice feature, I'm reluctant to include this into 1.30.0 because - we are (in theory at least) in feature-freeze, and this is definitely a new feature :-) - it may need more testing - I doubt that regex.h is portable; we should keep Windows in mind. Comments? Best regards Martin -- http://www.tm.oneiros.de