Re: [Dev-luatex] plugin for external formatting
Hi, this is a long thread and too many things are discussed at the same time so I will need some time to read and understand what is going on. My first thought is that some small modifications to \showlist and \showbox will help a lot. It's easy to write additional info like dimensions of each item in the list, or in case of characters the filename of a tfm with fontsize (or we may write the dimensions of each char as Hans suggested, but this is an overkill IMHO). My feeling is that we need to work out the specification and format of the `` node list'' first. In the first step, I would prefer to have only node-specific things, eg only what comes out after a box construction. I also got a similar request: to provide a primitive that writes out the content of a box and another primitive to re-construct that box back from the output. We can start with this and make further extensions later on. At the moment I cannot see clearly what is needed, but I am willing to write some extensions so that we can experiment with to see what is really needed and perhaps change what have been done. Thanh On Tue, Sep 20, 2005 at 11:58:08AM +0200, Karel Skoupy wrote:
Hi all,
as Hans has already mentioned, my concern with luatex is to have some interface/protocol for formatting the TeX stuff externally.
Without going to details now, I'm interested in alternative algorithms for formatting not only paragraphs, but the whole stream. For TUG 2005 I have written a prototype which doesn't use any TeX code at all (it just parasities on ADvi code for getting some metric information and showing the results). For long time I planned to make a whole new system from scratch, but for several reasons, that was reconsidered and Hans proposed a way (plugin mechanism for external engine), how to cooperate with TeX, so TeX could benefit from the new algorithm and I can concentrate on the core stuff.
So basically I need a stream of (character) boxes, glues, penalties, ... (is there a simple unambiguous notion for all that?) in a preprocessed form (I don't care about input and macro handling) plus some parameters (standard paragraph breaking parameters and the new special ones) and I will return a stream of fixed boxes.
'I' will often mean 'the engine' depending on the context :-)
In the first stage, I won't need lua (or any changes to TeX) at all. I plan to use \showlists for my input stream and to generate a standard TeX input file for reading the result back. Of course, it won't be so simple, there will be some macro programming and trickery, which will make the whole thing complicated, fragile, unreliable, and inefficient for real use. Therefore some hooks from the actively developed TeX will be probably useful for making the cooperation of TeX and the external engine smooth. It might use lua or not, we will see, in any case I would like to keep the plugin support generic and (complete but) minimal.
I will now list the aspects of the communications between TeX and the engine which I have thought of so far. I will be glad if you can just think about it for the moment and give me some feedback if you will.
* single paragraph stuff
I need: (1) complete representation of all the stuff which is to be returned formatted (2) sizes of all the objects which are involved in formatting (3) properties which influence the formatting (breakable, discardable, ...)
It seems that the standard output of \showlists (or \showbox) will mostly do. (1) is fulfilled I guess (the returned input needs to be only slightly modified to fit TeX).
(2) is little bit tricky, because for the characters I get only an id of the font. So I will need to know the exact reference to a real font to get the metrics information. This can be learned by eg. \show\tenrm. But of course it is not know in advance what fonts are used in the paragraph, so either all fonts can be listed at the beginning -- but where to get the list of all font definitions, and the definitions can actually change in the middle of the paragraph -- or I can make a first pass, collect the font ids and ask for them in the second pass. It will be bit tricky and won't be reliable due to redefinitions (I can also change the current id using \let and lose the old id (still used in the log), right?), so it will be OK for experimenting but for a real version, I will need a better support from TeX.
(3) is implicit, right?
* stream of paragraphs
I can need even the whole chapter, because I want to treat - shapes and layouts, which are relative to page and not to a particular paragraph - pagination, floats placement
For the basic experimenting I can redefine \par to something like \hfil\break\indent but it will restrict all kinds of things which can happen between the paragraphs (in vertical mode). Of course, the whole thing will never be compatible to TeX, because TeX expect after \par that the last paragraph was formatted and placed on the vertical list. So it will be responsibility of the user/macro-programmer to bear the consequences of using the alternative mechanism. Nevertheless, the consequences should be as small as possible.
So for the prototyping I can redefine \par or perhaps I can store the whole paragraphs in infinite hboxes (redefining \hsize?) or maybe I can use some \specials for tagging, but for the production version, this will be a very tricky part. Not so much for the engine, but mainly on the TeX side. It should be of a great concern for people who would want to use the new algorithms in their systems (Hans?), (after those ideas are first tested by a prototype :-).
* passing the parameters specific to the new algorithms
- layouts, shapes - maybe others, like weights for resolving paragraph contra page breaking
This will be a new thing so I hope that there is no compatibility burden.
* hyphenation
It will be a lot of additional work, but I think that I should handle it locally. There are two reasons:
(1) the protocol for failing and getting the list with new discretionaries (TeX's 2nd pass) for every individual paragraph would be extremely complicated, in the end it might be more difficult than handling it locally.
(2) TeX's hyphenation mechanism is IMHO one of the crappiest parts of TeX. I mean the way how the (non)ligatures are screwed up for discretionaries which are not used in the end. So if it is handled locally, it will be IMO simpler and more correct. There are also some research results concerning hyphenation, which are not implemented in TeX, because it would be too complicated.
At the first stage, I'll omit the hyphenation completelly.
At the moment, I don't remember anything else. I'm looking forward for your feedback.
--ksk _______________________________________________ Dev-luatex mailing list Dev-luatex@ntg.nl http://www.ntg.nl/mailman/listinfo/dev-luatex
Thanh Han The wrote:
My feeling is that we need to work out the specification and format of the `` node list'' first. In the first step, I would prefer to have only node-specific things, eg only what comes out after a box construction. I also got a similar request: to provide a primitive that writes out the content of a box and another primitive to re-construct that box back from the output. We can start with this and make further extensions later on.
At the moment I cannot see clearly what is needed, but I am willing to write some extensions so that we can experiment with to see what is really needed and perhaps change what have been done.
i'd say ... go ahead, so that we get a picture; at least we then hav ea starting point for karl's work Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hi all, On Wed, 21. Sep 2005, 10.52.08 10:52:08, Hans Hagen wrote:
OK, but that won't bring much, just some funny shapes.
sure, but on the other hand, it can be used to 'replace' the current par builder by a more advanced (e.g. hyphenation) one, imagine that we have:
\paroutput {write list to file (or pipe) call plugin in one-paragraph mode read list from file (or pipe)}
that way we can replace the current par builder, because by default it's something equivalent to:
\paroutput{\scanlist\expandafter{\the\list255}}
Sorry, I don't understand this very well. \list# is what we want to have, right? And \scanlist is a primitive of eTeX? I'm not used to it, but I believe that the way how \scanlist works is the best fitting the TeX macro programming. Where can I see any examples to understand it better? Then, \paroutput should be analogous to \output, right? Then \scanlist\expandafter{\the\list255} would just put the paragraph on the current list, right? But where is the paragraph broken by the default algorithm? If it is before activating \paroutput, then it is too late to rebreak (some information is lost and it would be wasting of processor anyway) so it can be only afterwards, but how would it recognize whether the list needs braking by the default breaker or not? Anyway, \output and ending a paragraph are not analogous because \output is asynchronous and \par is synchronous (\paroutput would be activated by \par?). Or do you think that \paroutput should be just for one line when considering a break (that would be terribly complicated). Sorry, I'm confused, but maybe I just take this line too seriously :-)
i wonder how hard this is to implement, you and taco should know -)
Of course that handling just one paragraph externally must by much easier than handling several ones, especially because nothing substantial in TeX model must be changed (one atomic operation would be replaced by another one on the same data). But for my research it has almost no value, I really need to work with whole chunks and layout chains if I want to achieve anything interesting.
Not only page crossing, but also column/shape/container crossing ... The problem is that we are used to \parshape, which just specifies something for certain lines in the current paragraph. But if we want to introduce real page layouts, then the shapes are not relative to the paragraphs any more. It will be a matter of formatting where a particular paragraph starts in the layout.
it's a combination:
- a main gutter shape (can be colums or whatever) - shapes bound to places on the gutter - shapes bound to specific places in the stream - shapes that may float (within boundary condition)
That's right.
But concerning the metric files, if I want to treat hyphenation locally, then I also need the kerning and ligature programs. In TeX it is done too early (and then it is taken apart and (wrongly) reconstructed during hyphenation pass). I want to do ligatures and kernings on demand, basically after hyphenation (it's not that simple, but anyway).
how about a font daemon, that one could cache/access font files; we need to go open type anyway so maybe such a deamon can be built on top of existing (non tex) libraries (port 31415)
A real daemon or a C library linked to the application (like kpathsea)? Well, there should be a library which provides simple interface to a client in any case, that library can (optionally) communicate with a daemon, that is how the client-daemon usually work.
-)
that's indeed too hard-coded for our purpose, so, next to a font daemon, we need a hyphenation daemon
I would start with a generic interface providing fonts and hyphenation. If it accesses the files or communicates with a daemon is an internal matter of the module implementing the interface.
Maybe we should make a whole new glossary, for example 'node' is quite OK for everything in the list (char, box, glue, penalty, ...), but 'list' is so ambiguous, there should be something more specific (maybe 'node list'). TeX itself doesn't give clear names (classes) for those objects. I had to make them names in NTS (to name the classes), maybe we can look into it.
good idea; we indeed need to define proper names and descriptions; can you make a proposal for that based on your nts experiences?
Well the whole problem is, that there are no properly defined data structures with unique type names in TeX (just some all-purpose data structures and macros which one must extremely careful with). Then there is no explicit need for unique names of different data types. So in NTS, I had to make names for types which are no explicit types in TeX. The two most outstanding examples are: * NodeList (nothing new, just the 'node' is always explicit) * Builder (currently built vertical/horizontal/... list)
It works for English (does it really always ?), because it is simple, right? I don't know, whether it is a real problem in any other language in practice. I just know the code and I think that it is incorrect, inconsistent and illogical.
my impression is that tehnumber of missed/wrong cases for english is so small that it falls within the 'no problem to correct it manually' criteria; languages with compound words, accented characters etc hav ehigher demands
Its maybe not a real problem in practice but it was the biggest pain to reimplement in NTS. It could be actually easier to do it in the 'right way'. So if I have to do it next time I don't want to repeat the same annoyance. On Wed, 21. Sep 2005, 10.59.33 10:59:33, Taco Hoekwater wrote:
For me the fully restorable read syntax is very important (can I get all [...]
I believe all extra parameters had better be in-line, for optimal flexibility. As much as possible, as least: some information is irretrievably lost in current TeX.
Quite a lot can be solved by adding a new read syntax for character and language nodes, one that does not depend on font and language id numbers. It'll be rather verbose and a tad slow, that is the price you pay for extra flexibility.
Well, inline is convenient to import but might be an overkill if there is a lot of info (like font file name and 'at size') for each character. That can be saved by using references, i.e. outputting a table first and then referencing the ids from the table. For scanning it's not much more work but for writing it requires a pass which collects the table from the references in the data first.
But concerning the metric files, if I want to treat hyphenation locally, then I also need the kerning and ligature programs. In TeX it is done too early (and then it is taken apart and (wrongly) reconstructed during hyphenation pass). I want to do ligatures and kernings on demand, basically after hyphenation (it's not that simple, but anyway).
In current TeX, it is not done too early: ligkerns can influence which line breaks are chosen, so the ligkern programs have to be applied first thing.
Yes, I know, I wrote that it's not that simple. IMO the ligkerns should be considered many times but the final modification of the data should be late. I would postpone it until output and ask dynamically (with maybe some caching) each time it is needed (getting sizes, ...). This approach would need to represent ligature prevention ({}) explicitly as a node.
NO. It screws up everything, not only taken or potential breaks, but even the potential hyphenation points which are never considered a break.
It does all potential hyphenation points, but that is still a subset of all hyphenation points: absolutely impossible points are ignored (like in the middle of the first line). At least, that's what Knuth's web comments say, and note rhat is not a feature of the algorithm, only an optimization.
I forgot about the first line, but is there anything else?
Perhaps just a little, but you have a valid case ;-)
Well, we can stop it here and make a unique thread if it is ever needed.
right? I don't know, whether it is a real problem in any other language in practice. I just know the code and I think that it is incorrect, inconsistent and illogical.
It is also near-impossible to fix while maintaining compatibility, which is probably why no-one has seriously attempted to clean up the code, up-til-now.
Sure, I originally wanted to do it in the 'right way' in NTS, but then I realized that was impossible while keeping compatibility and it was a real pain to reimplement it. On Wed, 21. Sep 2005, 11.26.53 11:26:53, Taco Hoekwater wrote:
Hans Hagen wrote:
so what happens if you remove the optimizations (forget about 100% compatibility)
Probably (hopefully) nothing except some bloat in the data structure, but I won't take bets on that.
But the only optimization is not changing the ligkerns in the first line of the paragraph while hyphenating, right? Then removing that would be even worse, but the difference is so small, anyway; this optimization is really not a problem.
It is also near-impossible to fix while maintaining compatibility, which is probably why no-one has seriously attempted to clean up the code, up-til-now.
but we don't care much about that part of compatibility, do we?
Nah. (but it was a big issue for etex, nts, and pdftex-in-dvi mode)
Exactly, it was a nightmare for me. On Wed, 21. Sep 2005, 06.25.50 06:25:50, Thanh Han The wrote:
My first thought is that some small modifications to \showlist and \showbox will help a lot. It's easy to write additional info like dimensions of each item in the list, or in case of characters the filename of a tfm with fontsize (or we may write the dimensions of each char as Hans suggested, but this is an overkill IMHO).
If I get the dimensions of characters explicitly, then I don't need to access/know the metric files. But this changes if I want to handle the hyphenation locally (which seems like the only way). Then I need also the ligkerns so I would either also need them explicitly (I mean the ligkern programs) -- that would be quite complicated to export (and import) -- or I would need to access the metric files anyway. Therefore the explicit char dimensions seems like a temporary solution only and I don't think it's worth doing.
My feeling is that we need to work out the specification and format of the `` node list'' first. In the first step, I would prefer to have only node-specific things, eg only what comes out after a box construction. I also got a similar request: to provide a primitive that writes out the content of a box and another primitive to re-construct that box back from the output. We can start with this and make further extensions later on.
Well, I think that the \showlists output contains everything except the reliable font id (and the language id?) and it is parseable. Well, the syntax could be slightly changed to make it more compatible with the input syntax (or maybe it can be really written in the input syntax) or to be better parseable by a plugin but the information carried by the syntax is the real matter. If I have all referenced fonts explicitly defined at the beginning (with maybe some renaming of the font ids when conflicts arise (can it happen?)) then I'm happy. So with the current syntax it would be something like: \tenrm=select font cmr10. \twelveit=select font cmti10 at 12.0pt. \hbox(6.94444+1.94444)x435.9297, glue set 318.73502fil .\hbox(0.0+0.0)x0.0 .\tenrm F .\kern-0.83334 .\tenrm r .\tenrm e .\tenrm e .\tenrm - .\discretionary .\tenrm s .\tenrm h .\tenrm a .\tenrm p .\kern0.27779 .\tenrm e .\glue 3.33333 plus 1.66666 minus 1.11111 .\twelveit t .\twelveit e .\twelveit x .\twelveit t or in the input syntax: \font\tenrm=cmr10 at 10.0pt \font\twelveit=cmti10 at 12.0pt \hbox to 435.9297pt {% \hbox{}% \tenrm F\kern -0.83334pt r{}e{}e% [...] .\twelveit t{}e{}x{}t% It seems that the input syntax would have to prevent the normal ligkern building, that would be quite awkward. So maybe some customary syntax in between.
At the moment I cannot see clearly what is needed, but I am willing to write some extensions so that we can experiment with to see what is really needed and perhaps change what have been done.
I can even play myself and then send a patch (as was suggested). I only need to install the right sources, I'll be grateful for pointing me to them and telling me any building tricks if needed. Best regards to all, --ksk
Karel Skoup wrote:
Hi all,
On Wed, 21. Sep 2005, 10.52.08 10:52:08, Hans Hagen wrote:
OK, but that won't bring much, just some funny shapes.
sure, but on the other hand, it can be used to 'replace' the current par builder by a more advanced (e.g. hyphenation) one, imagine that we have:
\paroutput {write list to file (or pipe) call plugin in one-paragraph mode read list from file (or pipe)}
that way we can replace the current par builder, because by default it's something equivalent to:
\paroutput{\scanlist\expandafter{\the\list255}}
Sorry, I don't understand this very well. \list# is what we want to have, right? And \scanlist is a primitive of eTeX? I'm not used to it, but I believe that the way how \scanlist works is the best fitting the TeX macro programming. Where can I see any examples to understand it better?
no, both 'list' things are what taco described as 'to do' (there is a scantokens in etex but it's kind of broken) in the example above (cf tacos' previous mail): \the\list<number> : serializes a list \scanlist{general text} : ' compiles' serialized list the main idea i wanted to introduce was the concept of a paragraph output routine concerning \scantokens (etex) .. this is a different animal, \def\pqr{pqr} \edef\abc{def \string\pqr} now \abc is a sequence of just letters \scantokens\expandafter{\abc} this gives "def pqr" because the string'd \pqr is tokenized again
Then, \paroutput should be analogous to \output, right? Then \scanlist\expandafter{\the\list255} would just put the paragraph on the
indeed
current list, right? But where is the paragraph broken by the default algorithm? If it is before activating \paroutput, then it is too late to rebreak (some information is lost and it would be wasting of processor anyway) so it can be only afterwards, but how would it recognize whether the list needs braking by the default breaker or not?
this is still open ... maybe we should keep a copy of the original input, i don't know how complex that is, but the list *before* it gets broken, the raw data that enters the par builder, so maybe we should have \parmode=0 : current behaviour \parmode=1 : current behaviour but stop at par building, save list in list255 \parmode=2 : current behaviour but do par building, save list in list255 so, with parmode=1, \the\list255 would provide you the raw list, unbroken
Anyway, \output and ending a paragraph are not analogous because \output is asynchronous and \par is synchronous (\paroutput would be activated by \par?). Or do you think that \paroutput should be just for one line when considering a break (that would be terribly complicated).
no, just a way to grab a paragraph and feed it to an external process (or to handle it in tex, whatever that means since the only thing in tex that we then can do it \the\list -)
Sorry, I'm confused, but maybe I just take this line too seriously :-)
right but it's no problem since we need to explore the idea
i wonder how hard this is to implement, you and taco should know -)
Of course that handling just one paragraph externally must by much easier than handling several ones, especially because nothing substantial in TeX model must be changed (one atomic operation would be replaced by another one on the same data).
But for my research it has almost no value, I really need to work with whole chunks and layout chains if I want to achieve anything interesting.
i know, but if we have the 'simple one paragraph' one, we can already do a lot of experiments; the next step would be to define a higher level things (not a paragraph but a sequence of areas to fill etc etc)
Not only page crossing, but also column/shape/container crossing ... The problem is that we are used to \parshape, which just specifies something for certain lines in the current paragraph. But if we want to introduce real page layouts, then the shapes are not relative to the paragraphs any more. It will be a matter of formatting where a particular paragraph starts in the layout.
it's a combination:
- a main gutter shape (can be colums or whatever) - shapes bound to places on the gutter - shapes bound to specific places in the stream - shapes that may float (within boundary condition)
That's right.
But concerning the metric files, if I want to treat hyphenation locally, then I also need the kerning and ligature programs. In TeX it is done too early (and then it is taken apart and (wrongly) reconstructed during hyphenation pass). I want to do ligatures and kernings on demand, basically after hyphenation (it's not that simple, but anyway).
how about a font daemon, that one could cache/access font files; we need to go open type anyway so maybe such a deamon can be built on top of existing (non tex) libraries (port 31415)
A real daemon or a C library linked to the application (like kpathsea)? Well, there should be a library which provides simple interface to a client in any case, that library can (optionally) communicate with a daemon, that is how the client-daemon usually work.
-)
that's indeed too hard-coded for our purpose, so, next to a font daemon, we need a hyphenation daemon
I would start with a generic interface providing fonts and hyphenation. If it accesses the files or communicates with a daemon is an internal matter of the module implementing the interface.
ok
Maybe we should make a whole new glossary, for example 'node' is quite OK for everything in the list (char, box, glue, penalty, ...), but 'list' is so ambiguous, there should be something more specific (maybe 'node list'). TeX itself doesn't give clear names (classes) for those objects. I had to make them names in NTS (to name the classes), maybe we can look into it.
good idea; we indeed need to define proper names and descriptions; can you make a proposal for that based on your nts experiences?
Well the whole problem is, that there are no properly defined data structures with unique type names in TeX (just some all-purpose data structures and macros which one must extremely careful with). Then there is no explicit need for unique names of different data types. So in NTS, I had to make names for types which are no explicit types in TeX.
The two most outstanding examples are: * NodeList (nothing new, just the 'node' is always explicit) * Builder (currently built vertical/horizontal/... list)
it's probably the builder that needs to get an alternative; something parlists and/or shapelists and since it then spawns the task to the plugin .. well, the plugin will have its own data structures, so from that perspective we can keep tex's node list (input for plugin) as well as vertical and horizontal lists (output of plugin) and forget about the rest
It works for English (does it really always ?), because it is simple, right? I don't know, whether it is a real problem in any other language in practice. I just know the code and I think that it is incorrect, inconsistent and illogical.
my impression is that tehnumber of missed/wrong cases for english is so small that it falls within the 'no problem to correct it manually' criteria; languages with compound words, accented characters etc hav ehigher demands
Its maybe not a real problem in practice but it was the biggest pain to reimplement in NTS. It could be actually easier to do it in the 'right way'. So if I have to do it next time I don't want to repeat the same annoyance.
On Wed, 21. Sep 2005, 10.59.33 10:59:33, Taco Hoekwater wrote:
For me the fully restorable read syntax is very important (can I get all
[...]
I believe all extra parameters had better be in-line, for optimal flexibility. As much as possible, as least: some information is irretrievably lost in current TeX.
Quite a lot can be solved by adding a new read syntax for character and language nodes, one that does not depend on font and language id numbers. It'll be rather verbose and a tad slow, that is the price you pay for extra flexibility.
Well, inline is convenient to import but might be an overkill if there is a lot of info (like font file name and 'at size') for each character. That can be saved by using references, i.e. outputting a table first and then referencing the ids from the table. For scanning it's not much more work but for writing it requires a pass which collects the table from the references in the data first.
indeed. some reference is needed; it also makes scanning the result more efficient since the refs are already known then
But concerning the metric files, if I want to treat hyphenation locally, then I also need the kerning and ligature programs. In TeX it is done too early (and then it is taken apart and (wrongly) reconstructed during hyphenation pass). I want to do ligatures and kernings on demand, basically after hyphenation (it's not that simple, but anyway).
In current TeX, it is not done too early: ligkerns can influence which line breaks are chosen, so the ligkern programs have to be applied first thing.
Yes, I know, I wrote that it's not that simple. IMO the ligkerns should be considered many times but the final modification of the data should be late. I would postpone it until output and ask dynamically (with maybe some caching) each time it is needed (getting sizes, ...). This approach would need to represent ligature prevention ({}) explicitly as a node.
a new kind of node indeed
NO. It screws up everything, not only taken or potential breaks, but even the potential hyphenation points which are never considered a break.
It does all potential hyphenation points, but that is still a subset of all hyphenation points: absolutely impossible points are ignored (like in the middle of the first line). At least, that's what Knuth's web comments say, and note rhat is not a feature of the algorithm, only an optimization.
I forgot about the first line, but is there anything else?
Perhaps just a little, but you have a valid case ;-)
Well, we can stop it here and make a unique thread if it is ever needed.
right? I don't know, whether it is a real problem in any other language in practice. I just know the code and I think that it is incorrect, inconsistent and illogical.
It is also near-impossible to fix while maintaining compatibility, which is probably why no-one has seriously attempted to clean up the code, up-til-now.
Sure, I originally wanted to do it in the 'right way' in NTS, but then I realized that was impossible while keeping compatibility and it was a real pain to reimplement it.
On Wed, 21. Sep 2005, 11.26.53 11:26:53, Taco Hoekwater wrote:
Hans Hagen wrote:
so what happens if you remove the optimizations (forget about 100% compatibility)
Probably (hopefully) nothing except some bloat in the data structure, but I won't take bets on that.
But the only optimization is not changing the ligkerns in the first line of the paragraph while hyphenating, right? Then removing that would be even worse, but the difference is so small, anyway; this optimization is really not a problem.
It is also near-impossible to fix while maintaining compatibility, which is probably why no-one has seriously attempted to clean up the code, up-til-now.
but we don't care much about that part of compatibility, do we?
Nah. (but it was a big issue for etex, nts, and pdftex-in-dvi mode)
Exactly, it was a nightmare for me.
On Wed, 21. Sep 2005, 06.25.50 06:25:50, Thanh Han The wrote:
My first thought is that some small modifications to \showlist and \showbox will help a lot. It's easy to write additional info like dimensions of each item in the list, or in case of characters the filename of a tfm with fontsize (or we may write the dimensions of each char as Hans suggested, but this is an overkill IMHO).
If I get the dimensions of characters explicitly, then I don't need to access/know the metric files. But this changes if I want to handle the hyphenation locally (which seems like the only way). Then I need also the ligkerns so I would either also need them explicitly (I mean the ligkern programs) -- that would be quite complicated to export (and import) -- or I would need to access the metric files anyway.
is it really a program or just a list of char combinations representing ligs
Therefore the explicit char dimensions seems like a temporary solution only and I don't think it's worth doing.
My feeling is that we need to work out the specification and format of the `` node list'' first. In the first step, I would prefer to have only node-specific things, eg only what comes out after a box construction. I also got a similar request: to provide a primitive that writes out the content of a box and another primitive to re-construct that box back from the output. We can start with this and make further extensions later on.
Well, I think that the \showlists output contains everything except the reliable font id (and the language id?) and it is parseable. Well, the syntax could be slightly changed to make it more compatible with the input syntax (or maybe it can be really written in the input syntax) or to be better parseable by a plugin but the information carried by the syntax is the real matter.
If I have all referenced fonts explicitly defined at the beginning (with maybe some renaming of the font ids when conflicts arise (can it happen?)) then I'm happy.
So with the current syntax it would be something like:
\tenrm=select font cmr10. \twelveit=select font cmti10 at 12.0pt. \hbox(6.94444+1.94444)x435.9297, glue set 318.73502fil .\hbox(0.0+0.0)x0.0 .\tenrm F .\kern-0.83334 .\tenrm r .\tenrm e .\tenrm e .\tenrm - .\discretionary .\tenrm s .\tenrm h .\tenrm a .\tenrm p .\kern0.27779 .\tenrm e .\glue 3.33333 plus 1.66666 minus 1.11111 .\twelveit t .\twelveit e .\twelveit x .\twelveit t
or in the input syntax:
\font\tenrm=cmr10 at 10.0pt \font\twelveit=cmti10 at 12.0pt \hbox to 435.9297pt {% \hbox{}% \tenrm F\kern -0.83334pt r{}e{}e% [...] .\twelveit t{}e{}x{}t%
It seems that the input syntax would have to prevent the normal ligkern building, that would be quite awkward.
you mean .\twelveit t .\nolig .\twelveit e .\nolig .\twelveit x .\nolig .\twelveit t
So maybe some customary syntax in between.
At the moment I cannot see clearly what is needed, but I am willing to write some extensions so that we can experiment with to see what is really needed and perhaps change what have been done.
I can even play myself and then send a patch (as was suggested). I only need to install the right sources, I'll be grateful for pointing me to them and telling me any building tricks if needed.
Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
participants (3)
-
Hans Hagen
-
Karel SkoupĂ˝
-
Thanh Han The