Determinig the script from the text is not hard.
It has been done in many projects.
On Sun, Dec 02, 2012 at 10:58:56AM +0100, Steve White wrote:Determining the script of a run of text is not that simple, take
> Hi all,
>
> I finally got something like Pablo's test working on my system. It doesn't
> show much new. As had already been established, with the right ConTeXt
> switches, OpenType features of kerning and ligatures work correctly with
> FreeSerif.
>
> Find attached. If there's a better way to do this, please comment: I may
> put some of this in the FreeFront usage notes. (Hm... I may tighten the
> italic y a bit.)
>
> A question remains: Why does ConTeXt (like some other TeX derivatives that
> use OpenType) not determine the OpenType script of runs of text from the
> Unicode (or other encoding) character range? All other font layout systems
> I know of do this. (Remember- a run of text in the OpenType sense is not
> the same as the scope of a TeX environment, it is typically a word,
> separated by white space or punctuation.)
"english (ARABIC.)"; to which script should the parenthesis and the
period be classified? (they have a "common" script property in Unicode
and not assigned to any given script). Unicode annex #24 provides an
algorithm for to handle this that an engine should implement:
http://www.unicode.org/reports/tr24/
Regards,
Khaled
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________