Ahoi, I’ve one annoying problem with ConTeXt: all üs (small u umlauts) seem to be encoded as decomposed unicode or something like that, at least every ü breaks into u + garbage if I copy some text from a ConTeXt PDF to an app that doesn’t really support Unicode. All other characters within Latin-1, including umlauts, are no problem, that’s why I think the problem might be in ConTeXt’s font handling. (Actually, this is about my invoice addresses that I copy from PDF to the German Post postage webshop. The site is quite new and I can’t understand how a big company can buy such crappy software. I already complained, there are more problems, but of course got only a template answer.) Greetlings, Hraban --- http://www.fiee.net http://wiki.contextgarden.net GPG Key ID 1C9B22FD
On 20 March 2018 at 08:42, Henning Hraban Ramm wrote:
Ahoi,
I’ve one annoying problem with ConTeXt: all üs (small u umlauts) seem to be encoded as decomposed unicode or something like that, at least every ü breaks into u + garbage if I copy some text from a ConTeXt PDF to an app that doesn’t really support Unicode.
You are on macOS, right? In my experience it was usually Apple's technology to blame. Perfectly valid PDFs with proper accented characters would always end up with decomposed characters when copy-pasting. Even pdftotext did a better job. Mojca
On Thu, Mar 22, 2018 at 10:08:44AM +0100, Mojca Miklavec wrote:
On 20 March 2018 at 08:42, Henning Hraban Ramm wrote:
I’ve one annoying problem with ConTeXt: all üs (small u umlauts) seem to be encoded as decomposed unicode or something like that, at least every ü breaks into u + garbage if I copy some text from a ConTeXt PDF to an app that doesn’t really support Unicode.
You are on macOS, right?
In my experience it was usually Apple's technology to blame.
I agree with you that Apple’s software has a tendency to decompose characters, but I wouldn’t blame them for that: it’s perfectly Unicode-compliant to do so, and by now software should support combining characters in at least a basic way. It’s a real problem that the software from the Deutsche Post isn’t able to handle them correctly. Best, Arthur
Am 2018-03-25 um 22:36 schrieb Arthur Reutenauer
On Thu, Mar 22, 2018 at 10:08:44AM +0100, Mojca Miklavec wrote:
On 20 March 2018 at 08:42, Henning Hraban Ramm wrote:
I’ve one annoying problem with ConTeXt: all üs (small u umlauts) seem to be encoded as decomposed unicode or something like that, at least every ü breaks into u + garbage if I copy some text from a ConTeXt PDF to an app that doesn’t really support Unicode.
You are on macOS, right? In my experience it was usually Apple's technology to blame.
I agree with you that Apple’s software has a tendency to decompose characters, but I wouldn’t blame them for that: it’s perfectly Unicode-compliant to do so, and by now software should support combining characters in at least a basic way. It’s a real problem that the software from the Deutsche Post isn’t able to handle them correctly.
While DP shop should be able to handle more than Latin-1, the problem seems to be in the viewer or in a combination of viewer and OS: - It doesn’t depend on the font, I tried Computer Modern and Alegreya (that is known to have some OpenType ligature issues). - I checked with several viewers, and the Adobe apps (Acrobat Pro 9 and Reader DC) decompose just the ü, while my other viewers including Apple’s Preview decompose all the umlauts. (Just copied and pasted into an hex editor.) - It also happens with PDFs from other sources. So it’s not a ConTeXt bug. Sorry for the noise. Greetlings, Hraban --- http://www.fiee.net http://wiki.contextgarden.net GPG Key ID 1C9B22FD
Am Tue, 20 Mar 2018 08:42:08 +0100 schrieb Henning Hraban Ramm:
I’ve one annoying problem with ConTeXt: all üs (small u umlauts) seem to be encoded as decomposed unicode or something like that, at least every ü breaks into u + garbage if I copy some text from a ConTeXt PDF to an app that doesn’t really support Unicode.
This can depend on the font. I just looked for another question at cambria and it e.g. uses char + accent for some of the Umlauts. So concrete code is needed to test this. -- Ulrike Fischer http://www.troubleshooting-tex.de/
Am Tue, 20 Mar 2018 08:42:08 +0100 schrieb Henning Hraban Ramm:
I’ve one annoying problem with ConTeXt: all üs (small u umlauts) seem to be encoded as decomposed unicode or something like that, at least every ü breaks into u + garbage if I copy some text from a ConTeXt PDF to an app that doesn’t really support Unicode.
This can depend on the font. I just looked for another question at cambria and it e.g. uses char + accent for some of the Umlauts. So concrete code is needed to test this. btw, the same is true for ligature building (but i already explained
On 3/22/2018 10:34 AM, Ulrike Fischer wrote: that many times so i won't repeat myself) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
participants (5)
-
Arthur Reutenauer
-
Hans Hagen
-
Henning Hraban Ramm
-
Mojca Miklavec
-
Ulrike Fischer