Ruby 1.9.1 and non-ascii char parsing in .tui file

Jose Augusto

8 Aug 2009 8 Aug '09

9:16 p.m.

Hi all, A few weeks ago I reported a problem with ruby 1.9.1, which was solved by removing the offending .tui line (Mojca and Hans AFAIR). The problem was related with the existence of non-ascii chars in the .tui file. Sadly it strikes again, now when chars with accents appear in titles (sections, subsections, etc...). The parsing of the line signaled below in the end of this message, from a .tui file, fails in ruby 1.9.1, but not in ruby 1.8.7. The error which is returned is also shown. If I remove the chars with accents from the section title all goes well. I'm using Mkii (--context=current). One of the advantages of ruby 1.8 over 1.8 is tht it is 3 times faster... However, ruby made lots of changes in string manipulation and storing when moving from 1.8 to 1.9, and that must be the source of the problem. I tracked the error to texutil.rb, line 1035: when /^c (.*)$/o then @plugins.reader('MyCommands', [$1]) but then i got lost in the Classes/Modules jungle :-) in that script. Perhaps it is this this procedure, in line 403 of texutil.rb, which triggers the error? def MyCommands::reader(logger,data) @@commands.push(data.shift+data.collect do |d| "\{#{d}\}" end.join) end Thx for your support in advance. If I can help in the solution of the problem please direct me in the task. I have some experience with ruby (I started using it when the 1st pickaxe book edition was published, around 2001) and with perl. But not with Lua :-)... Although I read alraedy quite a lot of Roberto's Lua book, I didn't started coding in Lua yet :-) Kind Regards J. Augusto ############## TUI file and trigered error ################### ## .tui snippet ........ c \mainreference{}{a}{2--0-1-1-0-0-0-0--1}{1}{1.1}

...

...
...
...
...
...
c \listentry{subsection}{3}{1.1.1}{Title with accents:

Ãçê}{2--0-1-1-1-0-0-0--1}{1} c \mainreference{}{b}{2--0-1-1-1-0-0-0--1}{1}{1.2}

### error pdfTeX warning: pdftex.exe: no GlyphToUnicode entry has been inserted yet! Output written on test1.pdf (1 page, 72793 bytes). Transcript written on test1.log. TeXUtil | parsing file test1.tui TeXUtil | fatal error in parsing test1.tui TeXUtil | check loading of file 'test1', begin/end problem TeXUtil | shortcuts : 169 TeXUtil | expansions: 308 #############################

Attachments:

attachment.html (text/html — 2.5 KB)

Show replies by date

Hans Hagen

9 Aug 9 Aug

9:57 p.m.

Jose Augusto wrote:

...

when /^c (.*)$/o then @plugins.reader('MyCommands', [$1])

what if you remove the o (/o) can you find out what changed between 1.8 and 1.9 ... actually 1.9 is the stepping stone to 2.0 and 2 versions can be incompatible to 1 versions also, can you make a test file so that we can see if there's a platform dependency? ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Jose Augusto

10 Aug 10 Aug

5:15 a.m.

Hi all, Ok, here it goes. Atached are the files used in the test. The problem as reported in the >> previous email << used the file with the offending chars wrapped in a main file, which was just: \starttext \input zzz.tex \stoptext That is, the offending chars were in zzz.tex. In that example I noticed the error because the cross-refs in the equation numbering were not working. The parsing of the .tui file by ruby 1.9.1 failed. Then I saw the errors. But then I made >> a single context file <<, which goes attached with the correct results (tui, tuo, pdf), obtained with ruby 1.8.7. Howver, when i run ruby 1.9.1 with patch 129 (the last one), in this single tex file (attached) now the first pass don't work! Here is the result (in windows xp), which proves ruby 1.9.1 doesn't like the non US-ASCII chars :-) F:\ANOS\ano09-10-pen\NotasProcSinal>texexec test1.tex TeXExec | processing document 'test1.tex' TeXExec | no ctx file found D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:946:in `===': invalid byte sequence in US-ASCII (ArgumentError) from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:946: in `scantexcontent' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:1907: in `processfile' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:1143: in `block (2 levels) in processtex' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:1133: in `timedrun' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:1142: in `block in processtex' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:1139: in `each' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:1139: in `processtex' from D:/Context/tex/texmf-context/scripts/context/ruby/texexec.rb:63:in`process' from D:/Context/tex/texmf-context/scripts/context/ruby/texexec.rb:53:in `main' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/switch.rb:133:in `execute' from D:/Context/tex/texmf-context/scripts/context/ruby/texexec.rb:787:in `<main>' Thanks all for your interest, Kind regards J. Augusto On Sun, Aug 9, 2009 at 8:57 PM, Hans Hagen wrote:

...

Jose Augusto wrote:

when /^c (.*)$/o then @plugins.reader('MyCommands', [$1])

...
what if you remove the o (/o)

can you find out what changed between 1.8 and 1.9 ... actually 1.9 is the stepping stone to 2.0 and 2 versions can be incompatible to 1 versions

also, can you make a test file so that we can see if there's a platform dependency?

----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net

___________________________________________________________________________________

Hans Hagen

3:10 p.m.

Jose Augusto wrote:

...

Hi all,

Ok, here it goes. Atached are the files used in the test.

The problem as reported in the >> previous email << used the file with the offending chars wrapped in a main file, which was just:

\starttext \input zzz.tex \stoptext

That is, the offending chars were in zzz.tex.

In that example I noticed the error because the cross-refs in the equation numbering were not working. The parsing of the .tui file by ruby 1.9.1 failed. Then I saw the errors.

ruby 1.9 internally is no longer 8 bit clean i.e. there is always an encoding (file as well as internal); there is no way to enforce this (there is the -E option but that is useless for 1.8) i now open some files explicitly in binary mode; maybe that helps; i have no clue what happens with string manipulations later on i always liked ruby but such fundamental changes (encoding, dropping functions etc) without renaming the program are a showstopper for me as one cannot predict what will be on the user's system it looks like i have to convert the texutil part to lua (takes a few days and since i mostly use luatex it has a low priority) i uploaded a beta for testing Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Jose Augusto

5:21 p.m.

I Hans, I just sent a mail with a possible patch, before I read this answer from you :-) As I say there, the patches work (at least for me) and I had updated context mkii a few hours ago, so I don't know if the betas you mentioned have already been installed... Hope the proposed patches be helpful... Thx very much for your answer. J. Augusto On Mon, Aug 10, 2009 at 2:10 PM, Hans Hagen wrote:

...

Jose Augusto wrote:

...
Hi all,

Ok, here it goes. Atached are the files used in the test.

The problem as reported in the >> previous email << used the file with the offending chars wrapped in a main file, which was just:

\starttext \input zzz.tex \stoptext

That is, the offending chars were in zzz.tex.

In that example I noticed the error because the cross-refs in the equation numbering were not working. The parsing of the .tui file by ruby 1.9.1 failed. Then I saw the errors.

ruby 1.9 internally is no longer 8 bit clean i.e. there is always an encoding (file as well as internal); there is no way to enforce this (there is the -E option but that is useless for 1.8)

i now open some files explicitly in binary mode; maybe that helps; i have no clue what happens with string manipulations later on

i always liked ruby but such fundamental changes (encoding, dropping functions etc) without renaming the program are a showstopper for me as one cannot predict what will be on the user's system

it looks like i have to convert the texutil part to lua (takes a few days and since i mostly use luatex it has a low priority)

i uploaded a beta for testing

Hans

----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net

___________________________________________________________________________________

Hans Hagen

6:27 p.m.

Jose Augusto wrote:

...

I Hans,

I just sent a mail with a possible patch, before I read this answer from you :-) As I say there, the patches work (at least for me) and I had updated context mkii a few hours ago, so I don't know if the betas you mentioned have already been installed...

Hope the proposed patches be helpful...

your patch will not work with ruby < 1.9 so if my patch (opening files in rb mode) works ok that's more robust; another option is to patch texmfstart.rb #!/usr/bin/env ruby #encoding: ASCII-8BIT ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Jose Augusto

7:20 p.m.

Hi Hans, The patch I proposed works also with ruby less than 1.9 (e.g. ruby 1.8.7)! The force_encoding() method is used only if RUBY_VERSION >= 1.9. If the scripts are executed by ruby 1.8 or lesser version, there's no change done to the current line of code (e.g. 'case line.chomp' ). Also, I verified the patch with ruby 1.8.7 and with 1.9.1, and it worked in both cases. The patch has however the problem of slowing processing (the "if" is executed when parsing each line of the files, and probably this issue could be optimized...) Meanwhile I don't think that the magic string # encoding: ASCII-8BIT solves the problem. This string indicates that the script is written in ASCII-8BIT, but when is reading the strings from the .tex or .tui files ruby 1.9.1 considers them as US-ASCII regardless of the encoding declared in # encoding: ... I introduced " # encoding: ASCII-8BIT " in texmfstart.rb, tex.rb and texutil.rb and the problem didn't disapeer :-( Of course I may be wrong. But the experiments I did make me think this way. Also, I don't have Linux at my disposal (I mean, with context installed) and there the behavior perhaps is different... Kind regards and thank you very much. J. Augusto On Mon, Aug 10, 2009 at 5:27 PM, Hans Hagen wrote:

...

Jose Augusto wrote:

...
I Hans,

I just sent a mail with a possible patch, before I read this answer from you :-) As I say there, the patches work (at least for me) and I had updated context mkii a few hours ago, so I don't know if the betas you mentioned have already been installed...

Hope the proposed patches be helpful...

your patch will not work with ruby < 1.9 so if my patch (opening files in rb mode) works ok that's more robust;

another option is to patch texmfstart.rb

#!/usr/bin/env ruby #encoding: ASCII-8BIT

----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net

___________________________________________________________________________________

Hans Hagen

7:39 p.m.

Jose Augusto wrote:

...

Meanwhile I don't think that the magic string # encoding: ASCII-8BIT solves the problem. This string indicates that the script is written in ASCII-8BIT, but when is reading the strings from the .tex or .tui files ruby 1.9.1 considers them as US-ASCII regardless of the encoding declared in # encoding: ...

not when opened as 'rb' (which i do in the latest texexec.rb) so i wonder why that does not work at your place (http://blog.nuclearsquid.com/writings/ruby-1-9-encodings) i run ruby 1.8.6 (and on a couple of servers even older versions and i'm not going to touch ruby on these machines (i don't want to patch scripts that are supposed to run another 5-10 years) but i might update context and texexec)

...

I introduced " # encoding: ASCII-8BIT " in texmfstart.rb, tex.rb and texutil.rb and the problem didn't disapeer :-(

hm, it worked here

...

Of course I may be wrong. But the experiments I did make me think this way. Also, I don't have Linux at my disposal (I mean, with context installed) and there the behavior perhaps is different...

that's my biggest fear ... introducing more problems Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Jose Augusto

8:11 p.m.

Hi Hans, I ran just now ruby 1.8.6 and the force_encoding() patch worked well. Just now I upgrade "--context=current". The banner in the texexec.rb is banner = ['TeXExec', 'version 6.2.1', '1997-2009', 'PRAGMA ADE/POD'] and the date of this script (after updating) is 10-04-2009 (its April..) I'm running mkii. How do I get mkii beta scripts, as texexec.rb you mention? All my rubys are compiled from the box with mingw in windows (2000 or XP, in 3 different machines). Of course the encoding thing is different in Linux, Windows (and DOS prompts, for the matter), so there is probably different behavior in ruby/context/tex interaction with chars in Linux and Windows boxes... Thx Jose On Mon, Aug 10, 2009 at 6:39 PM, Hans Hagen wrote:

...

Jose Augusto wrote:

Meanwhile I don't think that the magic string

...
# encoding: ASCII-8BIT solves the problem. This string indicates that the script is written in ASCII-8BIT, but when is reading the strings from the .tex or .tui files ruby 1.9.1 considers them as US-ASCII regardless of the encoding declared in # encoding: ...

not when opened as 'rb' (which i do in the latest texexec.rb) so i wonder why that does not work at your place

(http://blog.nuclearsquid.com/writings/ruby-1-9-encodings)

i run ruby 1.8.6 (and on a couple of servers even older versions and i'm not going to touch ruby on these machines (i don't want to patch scripts that are supposed to run another 5-10 years) but i might update context and texexec)

I introduced " # encoding: ASCII-8BIT " in texmfstart.rb, tex.rb and

...
texutil.rb and the problem didn't disapeer :-(

hm, it worked here

Of course I may be wrong. But the experiments I did make me think this

...
way. Also, I don't have Linux at my disposal (I mean, with context installed) and there the behavior perhaps is different...

that's my biggest fear ... introducing more problems

Hans

----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net

___________________________________________________________________________________

Hans Hagen

8:20 p.m.

Jose Augusto wrote:

...

Hi Hans,

I ran just now ruby 1.8.6 and the force_encoding() patch worked well.

yes, but if we can avoid adapting all those strings ... i'm pretty sure that if we follow that route we have to patch a lot also keep in mind that in 1.9 there are several encodings (external and internal) so setting up a roundtrip using the string properties involves more patches)

...

Just now I upgrade "--context=current". The banner in the texexec.rb is banner = ['TeXExec', 'version 6.2.1', '1997-2009', 'PRAGMA ADE/POD'] and the date of this script (after updating) is 10-04-2009 (its April..)

I'm running mkii. How do I get mkii beta scripts, as texexec.rb you mention?

it depends: if (on linux) "texexec" is a big file then you need to copy texexec.rb to texexec, else if it's a stub it should just work (in that case texmfstart will start texexec.rb)

...

All my rubys are compiled from the box with mingw in windows (2000 or XP, in 3 different machines). Of course the encoding thing is different in Linux, Windows (and DOS prompts, for the matter), so there is probably different behavior in ruby/context/tex interaction with

on windows there should be a stub (something texexec.cmd == "ruby texmfstart texexec ...") Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Jose Augusto

5:15 p.m.

Hi all, I think I solved the problem. At least for my actual errors... I read the following net article about string coding in ruby 1.9 and up: http://blog.grayproductions.net/articles/ruby_19s_string With that info at hand, I made two >> brute-force "trial" patches << (read the above article to see why I call them "brute force" :-) ) in two of the ruby context files where problems were arising (original line numbers are shown): ### .../scripts/context/ruby/base/tex.rb ## (946) case str.chomp ===>>> str = str.force_encoding("ISO-8859-1") if RUBY_VERSION >= "1.9" case str.chomp ### .../scripts/context/ruby/base/texutil.rb ### (1033) case line.chomp ===>>> line = line.force_encoding("ISO-8859-1") if RUBY_VERSION >= "1.9" case line.chomp The error is due to the fact that, by default, ruby 1.9 considers strings as US-ASCII and complains when finding chars not in that encoding. I don't know how to solve the problem for people writing in other encoding which is not ISO-8859-1. I tried the above with UTF-8 instead of ISO-8859-1 and it didn't work. Finally I don't know if there are any other places (at least case expressions) in the context ruby scripts where the problem might also show up. Kind regards, J. Augusto On Mon, Aug 10, 2009 at 4:15 AM, Jose Augusto wrote:

...

Hi all,

Ok, here it goes. Atached are the files used in the test.

The problem as reported in the >> previous email << used the file with the offending chars wrapped in a main file, which was just:

\starttext \input zzz.tex \stoptext

That is, the offending chars were in zzz.tex.

In that example I noticed the error because the cross-refs in the equation numbering were not working. The parsing of the .tui file by ruby 1.9.1 failed. Then I saw the errors.

But then I made >> a single context file <<, which goes attached with the correct results (tui, tuo, pdf), obtained with ruby 1.8.7.

Howver, when i run ruby 1.9.1 with patch 129 (the last one), in this single tex file (attached) now the first pass don't work! Here is the result (in windows xp), which proves ruby 1.9.1 doesn't like the non US-ASCII chars :-)

F:\ANOS\ano09-10-pen\NotasProcSinal>texexec test1.tex TeXExec | processing document 'test1.tex' TeXExec | no ctx file found D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:946:in `===': invalid byte sequence in US-ASCII (ArgumentError) from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:946: in `scantexcontent' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:1907: in `processfile' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:1143: in `block (2 levels) in processtex' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:1133: in `timedrun' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:1142: in `block in processtex' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:1139: in `each' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/tex.rb:1139: in `processtex' from D:/Context/tex/texmf-context/scripts/context/ruby/texexec.rb:63:in`process' from D:/Context/tex/texmf-context/scripts/context/ruby/texexec.rb:53:in `main' from D:/Context/tex/texmf-context/SCRIPTS/CONTEXT/ruby/base/switch.rb:133:in `execute' from D:/Context/tex/texmf-context/scripts/context/ruby/texexec.rb:787:in `<main>'

Thanks all for your interest,

Kind regards

J. Augusto

On Sun, Aug 9, 2009 at 8:57 PM, Hans Hagen wrote:

...
Jose Augusto wrote:

when /^c (.*)$/o then @plugins.reader('MyCommands', [$1])

...
what if you remove the o (/o)

can you find out what changed between 1.8 and 1.9 ... actually 1.9 is the stepping stone to 2.0 and 2 versions can be incompatible to 1 versions

also, can you make a test file so that we can see if there's a platform dependency?

----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net

___________________________________________________________________________________

5650

Age (days ago)

5652

Last active (days ago)

List overview

Download

10 comments

2 participants

participants (2)

Hans Hagen
Jose Augusto

Ruby 1.9.1 and non-ascii char parsing in .tui file

tags

participants (2)