ASCII input - non ASCII output
Hi, Is there a way in context, that for some *text* ascii input (in source .tex file) define mapping in internal tex system ? For example, if i put two ascii characters "dj" in .tex file, can i get cyrillic character "ђ" in .pdf ? And so on, for input b, v, g, d, ... to get output б, в, г, д, ... Or more general, for every letter/string in unicode to define the way that it should be read. It's benefit for non ascii language users, because in that case they don't need to switch keyboard layout all the time between command, math input and text input. In Latex, package fontenc(precisely OT2 encoding) do that things. Minimal example: \starttext a, b, v, g, d, dj, e, zh, z \stoptext should produce a, б, в, г, д, ђ, е, ж, з Best regards, Sava Maksimovic (Сава Максимовић :) )
Dear Sava, On 7 November 2017 at 13:48, Sava Maksimović wrote:
Hi,
Is there a way in context, that for some text ascii input (in source .tex file) define mapping in internal tex system ?
For example, if i put two ascii characters "dj" in .tex file, can i get cyrillic character "ђ" in .pdf ? And so on, for input b, v, g, d, ... to get output б, в, г, д, ...
Or more general, for every letter/string in unicode to define the way that it should be read.
ConTeXt can do that with some additional tricks (font features) in lua (I don't know the code by heart, but I assume someone else will answer that). But you'll have to wrap all the code that you want transliterated in blocks, so instead of having to switch the keyboard, you'll likely have to type additional commands. (Maybe it would work satisfactory without having to change too often, but I would probably not want to do that and would prefer to go for Unicode.)
It's benefit for non ascii language users, because in that case they don't need to switch keyboard layout all the time between command, math input and text input.
Keep in mind that you could in principle also translate the command names, so that you could use (excuse me, it's probably grammatically incorrect): \почнитекст a, б, в, г, д, ђ, е, ж, з \завршитекст
In Latex, package fontenc(precisely OT2 encoding) do that things.
Minimal example:
\starttext
a, b, v, g, d, dj, e, zh, z
\stoptext
should produce
a, б, в, г, д, ђ, е, ж, з
Just curious: why do you use "zh" instead of "ž"? Mojca
On 11/07/2017 01:48 PM, Sava Maksimović wrote:
Is there a way in context, that for some*text* ascii input (in source .tex file) define mapping in internal tex system ?
For example, if i put two ascii characters "dj" in .tex file, can i get cyrillic character "ђ" in .pdf ? And so on, for input b, v, g, d, ... to get output б, в, г, д, ...
Or more general, for every letter/string in unicode to define the way that it should be read.
It's benefit for non ascii language users, because in that case they don't need to switch keyboard layout all the time between command, math input and text input.
When mkiv was in its infancy, Hans helped me in writing something like this for my Greek module. It basically applies a Lua string.gsub to the input to produce and typeset utf8 output. But I pretty soon gave up using it. We're in the twenty-first century, and this sort of trickery really is not needed any more. And, as Mojca has said, you would have to have your text delimited, you don't want your ConTeXt commands to be transliterated as well. I can send you the relevant code if you want, and you could adapt it to your case. But I would advise against it. In the long run, changing keyboards is less hassle than this sort of semi-solution to an obsolete problem.
In Latex, package fontenc(precisely OT2 encoding) do that things.
Yes, LaTeX stays firmly in the 1970s. But the world has moved on. Thomas
Am Tue, 7 Nov 2017 20:38:14 +0100 schrieb Thomas A. Schmitz:
In Latex, package fontenc(precisely OT2 encoding) do that things.
Yes, LaTeX stays firmly in the 1970s. But the world has moved on.
And LaTeX has moved on too. You can use luatex and utf8 input with it without problem and for the unicode engines an unicode fontencoding and open type fonts are the default in the kernel. That you still *can* use special fontencodings to mimic transliteration, 8-bit-engines and type1-fonts with LaTeX doesn't mean that you *have* to use them. You have the choice. -- Ulrike Fischer http://www.troubleshooting-tex.de/
On 7 November 2017 at 20:38, Thomas A. Schmitz wrote:
When mkiv was in its infancy, Hans helped me in writing something like this for my Greek module. It basically applies a Lua string.gsub to the input to produce and typeset utf8 output.
This would be done with font features now. So it would in fact not be applied at input (which is more dangerous), but rather before the characters end up in PDF which makes a lot more sense anyway. The documentation is here, but there should be plenty more examples: http://pragma-ade.com/general/manuals/fonts-mkiv.pdf
I can send you the relevant code if you want, and you could adapt it to your case.
I'm pretty sure it's outdated if it was written at the infancy stage of mkiv.
But I would advise against it. In the long run, changing keyboards is less hassle than this sort of semi-solution to an obsolete problem.
One reason why I would not abandon it *immediately* for Serbian is that Serbian can actually be written in both scripts and there are straightforward rules for transliteration (it's not exactly one-to-one character, but those "dj"s should be easy enough to handle, in particular because there are also all the required digraphs in Unicode - or maybe more difficult exactly because of that). The fun part is that they cannot decide which script to use themselves :), so you end up with schoolmates using different scripts for their lecture notes. I'm still not arguing that this is the most brilliant idea, but I can totally imagine a Serbian professor wanting to "auto-generate" a Cyrillic version of his book on top of the Latin edition with close-to-zero extra effort. Greek, in contrast, hardly makes any sense when written in Latin alphabet. Mojca
On 11/08/2017 11:34 AM, Mojca Miklavec wrote:
I'm still not arguing that this is the most brilliant idea, but I can totally imagine a Serbian professor wanting to "auto-generate" a Cyrillic version of his book on top of the Latin edition with close-to-zero extra effort.
Ok, I can see that this may be a convenient way of producing different output from the same source; I wasn't aware of this (and I was somewhat provocative about Latex, of course :-) From a conceptional point of view, it still feels a bit hackish to do these things on the font level, because they are not/should not be tied to specific fonts - you'd have to rewrite your features or goodies or whatever they are called now for every font you want to use (and you may run into a number of funny inconsistencies in character names or even unicode slots). Thomas
On 8 November 2017 at 15:36, Thomas A. Schmitz wrote:
On 11/08/2017 11:34 AM, Mojca Miklavec wrote:
I'm still not arguing that this is the most brilliant idea, but I can totally imagine a Serbian professor wanting to "auto-generate" a Cyrillic version of his book on top of the Latin edition with close-to-zero extra effort.
Ok, I can see that this may be a convenient way of producing different output from the same source; I wasn't aware of this (and I was somewhat provocative about Latex, of course :-) From a conceptional point of view, it still feels a bit hackish to do these things on the font level, because they are not/should not be tied to specific fonts - you'd have to rewrite your features or goodies or whatever they are called now for every font you want to use (and you may run into a number of funny inconsistencies in character names or even unicode slots).
Now for a bit of off-topic-ness. Trivia. (Ignoring the attempts to make our own national keyboard) we are using "Croatian" keyboard (which is probably the same as Serbian layout) which has all the relevant-for-TeX keys ({}[]\) on the third plane (alt-gr+<something>), but I learnt computer programming on an US keyboard and preferred using US layout to those strange keys in the third plane. In computer programming there's basically never the need to use non-ascii characters. And in writing texts in native language there's no need to use those strange backslashes, so life was mostly good until I started using TeX in UTF-8. Back then I was basically switching the keyboard a couple of times per sentence (if not per word) and somewhat hated typing any TeX in native language for that reason. Then I switched to Dvorak and made myself a special layout. Now I have all the special keys from US keyboard easily accessible and all those strange non-ascii character on the third plane (alt-gr-C to get "Č"). That works much better for me now. So at least I know the pain of constant need of switching the layouts. Nevertheless I would still say that it makes more sense to put some effort to get nice UTF-8 documents. (Except, again, giving Serbian a bit of an exception due to the fact that the document would still be valid and perfectly readable in its Latin form.) One could argue in the other direction as well. It should be pretty straightforward to "transliterate" all ConTeXt commands into Cyrillic (ok, I have no clue what people usually do with q, x, y, w, ... but I'm sure there's a solution for that as well) and simply use English commands in Cyrillic script to simplify typing :) :) :) Mojca
How about using the "Transliterator" module by Philipp Gesang? https://modules.contextgarden.net/cgi-bin/module.cgi/ruid=6004710974/action=... Comes with TeXlive and ConTeXt standalone. On Wed, 2017-11-08 at 16:09 +0100, Mojca Miklavec wrote:
On 8 November 2017 at 15:36, Thomas A. Schmitz wrote:
On 11/08/2017 11:34 AM, Mojca Miklavec wrote:
I'm still not arguing that this is the most brilliant idea, but I can totally imagine a Serbian professor wanting to "auto-generate" a Cyrillic version of his book on top of the Latin edition with close-to-zero extra effort.
Ok, I can see that this may be a convenient way of producing different output from the same source; I wasn't aware of this (and I was somewhat provocative about Latex, of course :-) From a conceptional point of view, it still feels a bit hackish to do these things on the font level, because they are not/should not be tied to specific fonts - you'd have to rewrite your features or goodies or whatever they are called now for every font you want to use (and you may run into a number of funny inconsistencies in character names or even unicode slots).
Now for a bit of off-topic-ness.
Trivia. (Ignoring the attempts to make our own national keyboard) we are using "Croatian" keyboard (which is probably the same as Serbian layout) which has all the relevant-for-TeX keys ({}[]\) on the third plane (alt-gr+<something>), but I learnt computer programming on an US keyboard and preferred using US layout to those strange keys in the third plane. In computer programming there's basically never the need to use non-ascii characters. And in writing texts in native language there's no need to use those strange backslashes, so life was mostly good until I started using TeX in UTF-8. Back then I was basically switching the keyboard a couple of times per sentence (if not per word) and somewhat hated typing any TeX in native language for that reason. Then I switched to Dvorak and made myself a special layout. Now I have all the special keys from US keyboard easily accessible and all those strange non-ascii character on the third plane (alt-gr-C to get "Č"). That works much better for me now. So at least I know the pain of constant need of switching the layouts. Nevertheless I would still say that it makes more sense to put some effort to get nice UTF-8 documents. (Except, again, giving Serbian a bit of an exception due to the fact that the document would still be valid and perfectly readable in its Latin form.)
One could argue in the other direction as well. It should be pretty straightforward to "transliterate" all ConTeXt commands into Cyrillic (ok, I have no clue what people usually do with q, x, y, w, ... but I'm sure there's a solution for that as well) and simply use English commands in Cyrillic script to simplify typing :) :) :)
Mojca ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________
participants (5)
-
Henri
-
Mojca Miklavec
-
Sava Maksimović
-
Thomas A. Schmitz
-
Ulrike Fischer