Okay, this is weird (and long), but it gets clearer near the end: The following test using lpeg to split a comma-separated values works perfectly (never mind that subpattern B does not do anything): ---------------------------------------------------------------------- % A: % \catcode`\:=11 % \def\FM:ifFileIncluded#1{\message{whatever}} % % \FM:ifFileIncluded{} \directlua0{\unexpanded{ whiteSpace = lpeg.S(" \t\n") splitComma = lpeg.P({ lpeg.Ct(lpeg.V("elem") * (lpeg.V("sep") * lpeg.V("elem"))^0), sep = lpeg.S(",{}"), elem = whiteSpace^0 * lpeg.C((1 - lpeg.V("sep"))^1) * whiteSpace^0, % B }) }} \def\splitComma#1{% \directlua0{% local s = '\luaescapestring{\unexpanded{#1}}' local t = lpeg.match(splitComma,s) for k,v in ipairs(t) do texio.write_nl('[' .. v .. ']') end }% } \splitComma{A, B, C, D, E, F} % `print' is not documented, but prints a compiled pattern's bytecode % to the console \directlua0{lpeg.print(splitComma)} \end ----------------------------------------------------------------------- The pattern created is: [1 = elem 2 = sep 3 = elem 4 = sep ] 00: call -> 2 01: jmp -> 51 02: opencapture table(n = 0) (0) 03: call -> 10 04: choice -> 8 (0) 05: call -> 41 06: call -> 10 07: partial_commit -> 5 08: closecapture close(n = 0) (0) 09: ret 10: span [(09-0a)(20)] 19: opencapture simple(n = 0) (0) 20: choice -> 23 (0) 21: call -> 41 22: failtwice 23: any * 1 24: choice -> 30 (0) 25: choice -> 28 (0) 26: call -> 41 27: failtwice 28: any * 1 29: partial_commit -> 25 30: closecapture close(n = 0) (0) 31: span [(09-0a)(20)] 40: ret 41: set [(2c)(7b)(7d)] 50: ret 51: end However, if I run this with my own format, the pattern is: [1 = elem 2 = sep 3 = elem 4 = sep ] 00: call -> 2 01: jmp -> 51 02: opencapture table(n = 0) (0) 03: call -> 10 04: choice -> 8 (0) 05: call -> 41 06: call -> 10 07: partial_commit -> 5 08: closecapture close(n = 0) (0) 09: ret 10: span [(0a)(20)(2e)(42)(45)(49)(4c-4d)(4f-50)(53)] 19: opencapture simple(n = 0) (0) 20: choice -> 23 (0) 21: call -> 41 22: failtwice 23: any * 1 24: choice -> 30 (0) 25: choice -> 28 (0) 26: call -> 41 27: failtwice 28: any * 1 29: partial_commit -> 25 30: closecapture close(n = 0) (0) 31: span [(0a)(20)(2e)(42)(45)(49)(4c-4d)(4f-50)(53)] 40: ret 41: set [(2c)(7b)(7d)] 50: ret 51: end So instead of checking for whitespace in instructions 10 and 31, input is checked against the character set "\n .BEILMOPS" (and splitting the list fails almost completely). I know what a "Beil" is (an axe), and I know what a "Mops" is (some kind of weird animal, a bit like a groundhog [think Bill Murray]), but what is a "Beilmops"? And a "dot Beilmops"? Is is written in C# or what? Now you'll say "yeah, sure, who cares what weird stuff that weird Jonathan does in that weird format of his", but: This only happens if a macro "\FM:ifFileIncluded" is defined or referenced. If this macro is called "\FM:ifFileLoadeded" (the same length) or "\FM:ifFileIlcluded" ("l" instead of "n"), the pattern is compiled correctly. "\FM:ifFileLncluded" works, too. But the moment a control sequence called "\FM:ifFileIncluded" is used (defined or referenced), the lpeg pattern contains that strange animal. I tried to use Lua state 1 instead of 0 to make sure there were no definitions that could create a side-effect, but the pattern remained the same. I tried to uncomment (A) in above PlainTeX code, but the pattern is still correct. So I first suspected some kind of overflow in TeX's hash table that only occurs when there already exist a lot of control sequences and one of them has a very specific name and thus hash value (this does not seem to be the case, though). I tried moving the definition of "\FM:ifFileIncluded" to the beginning of my format (right after setting the catcodes), but without success. I tried defining it in the PlainTeX format, but again to no avail. I tried removing all unnecessary files from my format, with the same result. More weirdness that I more or less accidentally stumbled upon: \directlua1{\unexpanded{lpeg.print(lpeg.S(" \t\n"))}} results in set [(0a)(20)(2e)(42)(45)(49)(4c-4d)(4f-50)(53)] which is wrong, but \directlua1{\detokenize{lpeg.print(lpeg.S(" \t\n"))}} results in set [(09-0a)(20)] which is correct. And the equivalent, but slightly longer \directlua1{lpeg.print(lpeg.S(" \string\t\string\n"))} again results in the correct set [(09-0a)(20)] And indeed: If I replace "\unexpanded" in the code example above by "\detokenize" (and remove the empty lines which for some reason result in "\par" when "\detokenize" is used, but not with "\unexpanded"), the pattern is compiled correctly and works as expected. The plot thickens. Let's look further; how about simply telling Lua to print the string " \t\n"? \directlua1{\detokenize{texio.write_nl("[ \t\n]")}} results in [ ] but \directlua1{\unexpanded{texio.write_nl("[ \t\n]")}} results in [ IMPOSSIBLE. ] Not so weird anymore: "IMPOSSIBLE." is printed by procedure "print_cs" in luatex.web if the control sequence's pointer is below "active_base", that is zero or negative, or >= the pointer to the undefined control sequence (at least as far as I understand it). And "IMPOSSIBLE." sorted and stripped of duplicates is ... ".BEILMOPS"! Also note that "\t" is defined in PlainTeX, but not in my format. If I define it at the beginning of the code example above, nothing changes. But if I define it before defining "\FM:ifFileIncluded", everything works as expected, and \directlua1{\unexpanded{texio.write_nl("[ \t\n]")}} results in [ ] Not "IMPOSSIBLE." anymore. Now: If the control passed to "print_cs" (or "tokenlist_to_cstring" in luatoken.c) is undefined, "IMPOSSIBLE." is printed. As "\t" is indeed undefined, this is completely expected. What's not expected, is that this only happens if the macro "\FM:ifFileIncluded" is not defined before "\t" is defined (if at all). And weird again: "\n" is defined by neither PlainTeX nor my format, but does not result in "IMPOSSIBLE.". Side note: \immediate\write16{\detokenize{[ \t\n]}} \immediate\write16{\unexpanded{[ \t\n]}} both correctly display "[ \t \n ]". So it seems that "\unexpanded" works as expected, but something else does not. And finally: If I say "\let\t\t" at the beginning of my format, everything works as well. So "\t" may well be undefined, as long as it is entered into TeX's hash table before "\FM:ifFileIncluded" is. Jonathan