Hi, It has been a busy day. On 09/05/2010 04:39 PM, taco@metatex.org wrote:
New Revision: 3857
Log: add some virtual accessors to fontloader.open() return data
Here is the relevant new section in the manual: %%% start manual As mentioned earlier, the return value of \type{fontloader.open()} is a userdata object. In \LUATEX\ versions before 0.63, the only way to have access to the actual metrics was to call \type{fontloader.to_table()} on this object, returning a table structure that is explained in the following subsections. However, it turns out that the result from \type{fontloader.to_table()} sometimes needs very large amounts of memory (depending on the font's complexity and size) so starting with \LUATEX\ 0.63, it will incrementally become possible to access the userdata object directly. In the \LUATEX\ 0.63.0, the following is implemented: \startitemize \item all top-level keys that would be returned by \type{to_table()} can also be accessed directly. \item the top-level key \quote{glyphs} returns a {\it virtual\/} array that allows indices from \type{0} to ($\type{f.glyphmax}-1$). \item the items in that virtual array (the actual glyphs) are themselves also userdata objects, and each has accessors for all of the keys explained in the section \quote{Glyph items} below. \item the top-level key \quote{subfonts} returns an {\it actual} array of userdata objects, one for each of the subfonts (or nil, if there are no subfonts). \stopitemize A short example may be helpful. This code generates a printout of all the glyph names in the font \type{PunkNova.kern.otf}: \starttyping local f = fontloader.open('PunkNova.kern.otf') print (f.fontname); local i = 0 while (i < f.glyphmax) do local g = f.glyphs[i] if g then print(g.name) end i = i + 1 end fontloader.close(f) \stoptyping In this case, the \LUATEX\ memory requirement stays below 100MB on the test computer, while the internal stucture generated by \type{to_table()} needs more than 2GB of memory (the font itself is 6.9MB in disk size). In \LUATEX\ 0.63 only the top-level font, the subfont table entries, and the glyphs are virtual objects, everything else still produces normal lua values and tables. In future versions, more return values will be replaced by userdata objects (as much as needed to keep the memory requirements in check). %%% end manual For course the memory savings will be less when more of the font information is used than just the glyph names, but it does seem to help quite a lot already (assuming the characters are converted into a metrics structure one at a time). The reason for this message is this: what are items that need to be virtualized, and which ones can easily be left alone? For example <glyph>.boundingbox returns an array of 4 integer numbers. It seems to me that it makes not that much sense to write a dedicated userdata object for such boundingboxes: the method call overhead will probably outweigh the gain from using less memory. On the other hand, <glyph>.kerns can be enormous (as it is in the punknova.kern case) and should probably be converted. Does anybody want to think about a shortlist of such items? Best wishes, Taco
[...]
The reason for this message is this: what are items that need to be virtualized, and which ones can easily be left alone?
For example
<glyph>.boundingbox
returns an array of 4 integer numbers. It seems to me that it makes not that much sense to write a dedicated userdata object for such boundingboxes: the method call overhead will probably outweigh the gain from using less memory.
On the other hand,
<glyph>.kerns
can be enormous (as it is in the punknova.kern case) and should probably be converted.
Does anybody want to think about a shortlist of such items?
Wow! Thank you for preserving objects access interface; it seems I don't need to change anything. Among glyph.kerns I'd vote to virtualize glyph.lookups, as this is the _most_ scary part, especially in non-latin fonts. Possibly also glyph.anchors Regarding mappings, I think that every of the following loaded_font.map.map loaded_font.map.backmap loaded_font.map.enc could be available on demand. Regards -- Pawe/l Jackowski P.Jackowski@gust.org.pl
On 7-9-2010 8:12, Paweł Jackowski wrote:
[...]
The reason for this message is this: what are items that need to be virtualized, and which ones can easily be left alone?
For example
<glyph>.boundingbox
returns an array of 4 integer numbers. It seems to me that it makes not that much sense to write a dedicated userdata object for such boundingboxes: the method call overhead will probably outweigh the gain from using less memory.
On the other hand,
<glyph>.kerns
can be enormous (as it is in the punknova.kern case) and should probably be converted.
Does anybody want to think about a shortlist of such items?
Wow! Thank you for preserving objects access interface; it seems I don't need to change anything.
It all depends on usage ... I need to change a lot (to more ugly code actually) and we're only talking of a partial userdata -)
Among glyph.kerns I'd vote to virtualize glyph.lookups, as this is the _most_ scary part, especially in non-latin fonts. Possibly also glyph.anchors
it doesn't make it less scary, does it?
Regarding mappings, I think that every of the following
loaded_font.map.map loaded_font.map.backmap loaded_font.map.enc
I've experimented a lot with these things (mem consumption, speed, eyc) and and there is quite a trade-off - accessing them as userdata each time needed takes a function call and is slower than accessing a table - most tables are not that large data and hardly give overhead - as soon as you would like to have the data at the lua end more permanently you have to do quite some (inefficient) table construction - there is no gain in for instance the map tables, and if one needs them one often needs the whole table and unless you make a copy you then need to keep the font object in memory it all depends on how you use fonts ... (1) just consult then ... the current userdata saves much mem as one can access selectively (2) construct a tex font (tfm table) ... one stepwise fills the tfm data structure, and keeps whatever needed around, and then closes the fontloader object (3) idem but keeping the object open in case 2 there will be the penalty of constructing tables from userdata but we win in less memory consumption of glyph data (if not used at the lua end); in case 3 you basically loose the gain as you need to keep the whole font in mem anyway, ok, not in table form, but one will probably also cache some info so that duplicates and the gain gets lost the advantage of the userdata for glyphs and the root tables is that one delays table conversion but eventually one will need much of the data but at least it can be fetched selectively; the userdata keeps the memory footprint low in the sense that less intermediate data is needed so, it's kind of a mix ... partial userdata helps, but too much of it works against us Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hi, On 09/07/10 22:45, Hans Hagen wrote:
the advantage of the userdata for glyphs and the root tables is that one delays table conversion but eventually one will need much of the data but at least it can be fetched selectively; the userdata keeps the memory footprint low in the sense that less intermediate data is needed
The glyph kerning information is a good example: usually when processing a fontloader font into a metrics table, you need access to all of that data (at least per glyph). Therefore, there is not that much point in totally virtualizing the 'kerns' entry as then each access to it becomes a method call, which will slow down processing considerably, and the end result is likely to use a fair amount of memory anyway. Here is the current format of 'kerns', with punknova.kern.otf as example: <f>.glyphs[65].kerns={ { ["char"]="quotedblright.9", ["lookup"]={ "pp_l_0_g_4", "pp_l_0_g_5", ... 94 more entries ... }, ["off"]=24, }, { ["char"]="quotedblleft.9", ... There is not a lot to be gained from virtualizing this, even though the dump of each of the kerns per-glyhp tables is nearly 2 megabytes. The reason is that you typically need code like this: for _,v in pairs(<f>.glyphs[0].kerns) do local n = v.char local off = v.off for _,l in pairs(v.lookup) do ... end end So you are accessing everything (and likely will even convert it to a temporary table at some point). In punknova.kern, there tend to be 1242 top-level kern array entries, each of which has about a little under a hundred keys. That would be over 100k worth of metatable lookups, and that will definitely be slower than accessing the actual table that is returned at the moment. Best wishes, Taco
Hi, On 09/08/2010 07:35 AM, Taco Hoekwater wrote:
In punknova.kern, there tend to be 1242 top-level kern array entries, each of which has about a little under a hundred keys. That would be over 100k worth of metatable lookups, and that will definitely be slower than accessing the actual table that is returned at the moment.
Also, each of those metatable calls would introduce a new userdata object to be garbage collected. Hans and I just did some tests, and it seems that the userdata access is useful if *but only if* you are very low on memory. In other cases, it just adds extra objects to be garbage collected, which makes the collector slower. That is on top of extra time spent on the actual calls, and even worse: those extra gc objects tend to be scattered around in memory, resulting in extra minor page faults (cpu cache misses) and all that has a noticeable effect on run speed: the metatable-based access is 20-30% slower than the old massive to_table. Therefore, there seems little point in expanding the metadata functionality any further. What is there will stay, but adding more metadata objects appears to be a waste of time on all sides. Best wishes, Taco
On 8-9-2010 6:57, Taco Hoekwater wrote:
Hi,
On 09/08/2010 07:35 AM, Taco Hoekwater wrote:
In punknova.kern, there tend to be 1242 top-level kern array entries, each of which has about a little under a hundred keys. That would be over 100k worth of metatable lookups, and that will definitely be slower than accessing the actual table that is returned at the moment.
Also, each of those metatable calls would introduce a new userdata object to be garbage collected.
Hans and I just did some tests, and it seems that the userdata access is useful if *but only if* you are very low on memory. In other cases, it just adds extra objects to be garbage collected, which makes the collector slower. That is on top of extra time spent on the actual calls, and even worse: those extra gc objects tend to be scattered around in memory, resulting in extra minor page faults (cpu cache misses) and all that has a noticeable effect on run speed: the metatable-based access is 20-30% slower than the old massive to_table.
Therefore, there seems little point in expanding the metadata functionality any further. What is there will stay, but adding more metadata objects appears to be a waste of time on all sides.
quite some time went in rewriting mkiv font loading code to do these tests and as i don't want to throw it away, mkiv will provide several loading methods but the table based one will be the default; we will wrap up some stats in an (maybe mapsable) article some day although a sound strategy for when to use what method (as taco explained) is not possible Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Taco Hoekwater wrote: [...]
Hans and I just did some tests, and it seems that the userdata access is useful if *but only if* you are very low on memory. In other cases, it just adds extra objects to be garbage collected, which makes the collector slower. That is on top of extra time spent on the actual calls, and even worse: those extra gc objects tend to be scattered around in memory, resulting in extra minor page faults (cpu cache misses) and all that has a noticeable effect on run speed: the metatable-based access is 20-30% slower than the old massive to_table.
Interesting. As you said, much depends on how one uses / tests. I made some tests with translating fontforge font structure directly into the luatex font table without an intermediate form. This gives bests results, but obviously requires too many simplifications.
Therefore, there seems little point in expanding the metadata functionality any further. What is there will stay, but adding more metadata objects appears to be a waste of time on all sides.
OK, thank you. -- Pawe/l Jackowski P.Jackowski@gust.org.pl
On 9-9-2010 9:07, Paweł Jackowski wrote:
Taco Hoekwater wrote: [...]
Hans and I just did some tests, and it seems that the userdata access is useful if *but only if* you are very low on memory. In other cases, it just adds extra objects to be garbage collected, which makes the collector slower. That is on top of extra time spent on the actual calls, and even worse: those extra gc objects tend to be scattered around in memory, resulting in extra minor page faults (cpu cache misses) and all that has a noticeable effect on run speed: the metatable-based access is 20-30% slower than the old massive to_table.
Interesting. As you said, much depends on how one uses / tests. I made some tests with translating fontforge font structure directly into the luatex font table without an intermediate form. This gives bests results, but obviously requires too many simplifications.
That is indeed where there can be some gain (no intermediate table but compensated by function calls) but as soon as you start keeping some extra info around (not passed to tex or manipulated inbetween) it quickly gets worse due to the mentioned scattering. Then there can be a smaller mem footprint at the cost of some extra runtime. On the other hand, using the traditional method (table) and invoking the garbage collector every now and then gives an equally low memory footprint but a sweep takes some time as well. Interesting is that on some test runs I can get about twice the performance using tables but it's not that easy to figure out why. Btw, in the table approach one can manipulate the font before conversion, something that is less easy in the userdata variant. Anyhow, what we provide now is a nice compromise Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Interesting is that on some test runs I can get about twice the performance using tables but it's not that easy to figure out why.
I've learned just one think about optimizations: my intuition fails so only practical tests make sense.
Btw, in the table approach one can manipulate the font before conversion, something that is less easy in the userdata variant.
Anyhow, what we provide now is a nice compromise
Sure, agree. -- Pawe/l Jackowski P.Jackowski@gust.org.pl
participants (3)
-
Hans Hagen
-
Paweł Jackowski
-
Taco Hoekwater