Unexpected space after hyphen in xml/html export
List, Occasionally an unexpected and unwanted space is inserted following the hyphen of a compound word in html/xml exports. In a document with about 500 such compounds, this occurs 30 times. The following input: \setupbackend [export=yes,xhtml=yes] \starttext Theocracy, the priest power; monarchy, the one|-|man power; and oligarchy, the few|-|men power|—|are three forms of vicarious government over the people, perhaps for them, not by them. Democracy is direct self|-|government over all the people, for all the people, by all the people. Our institutions are democratic: theocratic, monarchic, oligarchic vicariousness is all gone. We have no Divine vicar who is responsible to God for our politics and religion; only a human attorney, answerable to the people for his official work. The axis of rotation has changed: the equator of the old civilization passes through the poles of the new. This makes some change in the geography of both Church and State. \stopsection \stoptext Produces, in relevant part, the following xml (wrapped for convenience): Theocracy, the priest power; monarchy, the one-man power; and oligarchy, the few- men power—are three forms of vicarious government over the people, perhaps for them, not by them. Democracy is direct self-government over all the people, for all the people, by all the people. Our institutions are democratic: theocratic, monarchic, oligarchic vicariousness is all gone. We have no Divine vicar who is responsible to God for our politics and religion; only a human attorney, answerable to the people for his official work. The axis of rotation has changed: the equator of the old civilization passes through the poles of the new. This makes some change in the geography of both Church and State.</document> Note the space after "few-" in the second line of the output text. (The paragraph is a quotation from Theodore Parker's sermon "The Effect of Slavery on the American People," delivered on July 4, 1858. It is thought by many to be the inspiration for part of Lincoln's Gettysburg Address.) -- Rik
On 10/7/2018 12:19 AM, Rik Kabel wrote:
List,
Occasionally an unexpected and unwanted space is inserted following the hyphen of a compound word in html/xml exports. In a document with about 500 such compounds, this occurs 30 times.
The following input:
\setupbackend [export=yes,xhtml=yes] \starttext Theocracy, the priest power; monarchy, the one|-|man power; and oligarchy, the few|-|men power|—|are three forms of vicarious government over the people, perhaps for them, not by them. Democracy is direct self|-|government over all the people, for all the people, by all the people. Our institutions are democratic: theocratic, monarchic, oligarchic vicariousness is all gone. We have no Divine vicar who is responsible to God for our politics and religion; only a human attorney, answerable to the people for his official work. The axis of rotation has changed: the equator of the old civilization passes through the poles of the new. This makes some change in the geography of both Church and State. \stopsection \stoptext
Produces, in relevant part, the following xml (wrapped for convenience):
Theocracy, the priest power; monarchy, the one-man power; and oligarchy, the few- men power—are three forms of vicarious government over the people, perhaps for them, not by them. Democracy is direct self-government over all the people, for all the people, by all the people. Our institutions are democratic: theocratic, monarchic, oligarchic vicariousness is all gone. We have no Divine vicar who is responsible to God for our politics and religion; only a human attorney, answerable to the people for his official work. The axis of rotation has changed: the equator of the old civilization passes through the poles of the new. This makes some change in the geography of both Church and State.</document>
Note the space after "few-" in the second line of the output text.
(The paragraph is a quotation from Theodore Parker's sermon "The Effect of Slavery on the American People," delivered on July 4, 1858. It is thought by many to be the inspiration for part of Lincoln's Gettysburg Address.)
But it's not what happened: quite some folks in power have middle age monarchic characteristics, oligarchies are around etc. Old institutions (that probably root deeply in mankind0 are just better in pretending to be different. Anyway fixed in next beta (but you need to keep an eye on disc side effects. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
On 10/6/2018 19:28, Hans Hagen wrote:
On 10/7/2018 12:19 AM, Rik Kabel wrote:
List,
Occasionally an unexpected and unwanted space is inserted following the hyphen of a compound word in html/xml exports. In a document with about 500 such compounds, this occurs 30 times.
The following input:
\setupbackend [export=yes,xhtml=yes] \starttext Theocracy, the priest power; monarchy, the one|-|man power; and oligarchy, the few|-|men power|—|are three forms of vicarious government over the people, perhaps for them, not by them. Democracy is direct self|-|government over all the people, for all the people, by all the people. Our institutions are democratic: theocratic, monarchic, oligarchic vicariousness is all gone. We have no Divine vicar who is responsible to God for our politics and religion; only a human attorney, answerable to the people for his official work. The axis of rotation has changed: the equator of the old civilization passes through the poles of the new. This makes some change in the geography of both Church and State. \stopsection \stoptext
Produces, in relevant part, the following xml (wrapped for convenience):
Theocracy, the priest power; monarchy, the one-man power; and oligarchy, the few- men power—are three forms of vicarious government over the people, perhaps for them, not by them. Democracy is direct self-government over all the people, for all the people, by all the people. Our institutions are democratic: theocratic, monarchic, oligarchic vicariousness is all gone. We have no Divine vicar who is responsible to God for our politics and religion; only a human attorney, answerable to the people for his official work. The axis of rotation has changed: the equator of the old civilization passes through the poles of the new. This makes some change in the geography of both Church and State.</document>
Note the space after "few-" in the second line of the output text.
(The paragraph is a quotation from Theodore Parker's sermon "The Effect of Slavery on the American People," delivered on July 4, 1858. It is thought by many to be the inspiration for part of Lincoln's Gettysburg Address.)
But it's not what happened: quite some folks in power have middle age monarchic characteristics, oligarchies are around etc. Old institutions (that probably root deeply in mankind0 are just better in pretending to be different.
Anyway fixed in next beta (but you need to keep an eye on disc side effects.
Hans Alas, it is fixed for that particular occurence, but it still occurs 29 times in the document (using today's beta).
A more extended search shows that there are also spaces afters en-dashes (in "Press|–|Citizen" and in "Miniatur|–|Bibliothek der Deutschen Classiker"), but none after em-dashes. Unfortunately, my attempts to reproduce this in a smaller document have so far failed. Perhaps this quote, in which the problem also occurs, is in line with your other comments: There is only one party in the United States, the Property Party\nbsp \dots{} and it has two right wings: Republican and Democrat. Republicans are a bit stupider, more rigid, more doctrinaire in their laissez|-|faire capitalism than the Democrats, who are cuter, prettier, a bit more corrupt—until recently\nbsp \dots{} and more willing than the Republicans to make small adjustments when the poor, the black, the anti|-|imperialists get out of hand. But, essentially, there is no difference between the two parties. (That is from Gore Vidal in 1975. Plus ça change.) In it, I get a space after "anti-". But more like this and folks will complain about politics on the list. Or worse, encourage it. -- Rik
Alas, it is fixed for that particular occurence, but it still occurs 29 times in the document (using today's beta).
A more extended search shows that there are also spaces afters en-dashes (in "Press|–|Citizen" and in "Miniatur|–|Bibliothek der Deutschen Classiker"), but none after em-dashes. Unfortunately, my attempts to reproduce this in a smaller document have so far failed. well, you know: no mwe, no solution
Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
On 10/8/2018 18:32, Hans Hagen wrote:
Alas, it is fixed for that particular occurence, but it still occurs 29 times in the document (using today's beta).
A more extended search shows that there are also spaces afters en-dashes (in "Press|–|Citizen" and in "Miniatur|–|Bibliothek der Deutschen Classiker"), but none after em-dashes. Unfortunately, my attempts to reproduce this in a smaller document have so far failed. well, you know: no mwe, no solution
And here is the mwe. The culprit, it appears, is bidi. I have tried all documented options (but not all combinations) for \setupdirections, and the only one under which there is no problem is "off". With bidi active, there is a spurious space wherever a linebreak is introduced. As the example demonstrates, this is not a function of the compounds, but of hyphenation in general. \setupbackend [export=yes] \setupdirections [bidi=on] \starttext abraca% adjust to cause hyphenation with your textwidth abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra \stoptext (The problem appears in the export html/xml file, not in the pdf.) -- Rik
On 10/10/2018 14:50, Rik Kabel wrote:
On 10/8/2018 18:32, Hans Hagen wrote:
Alas, it is fixed for that particular occurence, but it still occurs 29 times in the document (using today's beta).
A more extended search shows that there are also spaces afters en-dashes (in "Press|–|Citizen" and in "Miniatur|–|Bibliothek der Deutschen Classiker"), but none after em-dashes. Unfortunately, my attempts to reproduce this in a smaller document have so far failed. well, you know: no mwe, no solution
And here is the mwe. The culprit, it appears, is bidi. I have tried all documented options (but not all combinations) for \setupdirections, and the only one under which there is no problem is "off". With bidi active, there is a spurious space wherever a linebreak is introduced. As the example demonstrates, this is not a function of the compounds, but of hyphenation in general.
\setupbackend [export=yes] \setupdirections [bidi=on] \starttext abraca% adjust to cause hyphenation with your textwidth abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra \stoptext
(The problem appears in the export html/xml file, not in the pdf.)
Not a function of explicit compounds (||) but of hyphenation of compounds. Normal hyphenation does not bring about the problem. -- RIk
On 10/10/2018 15:11, Rik Kabel wrote:
On 10/10/2018 14:50, Rik Kabel wrote:
On 10/8/2018 18:32, Hans Hagen wrote:
Alas, it is fixed for that particular occurence, but it still occurs 29 times in the document (using today's beta).
A more extended search shows that there are also spaces afters en-dashes (in "Press|–|Citizen" and in "Miniatur|–|Bibliothek der Deutschen Classiker"), but none after em-dashes. Unfortunately, my attempts to reproduce this in a smaller document have so far failed. well, you know: no mwe, no solution
And here is the mwe. The culprit, it appears, is bidi. I have tried all documented options (but not all combinations) for \setupdirections, and the only one under which there is no problem is "off". With bidi active, there is a spurious space wherever a linebreak is introduced. As the example demonstrates, this is not a function of the compounds, but of hyphenation in general.
\setupbackend [export=yes] \setupdirections [bidi=on] \starttext abraca% adjust to cause hyphenation with your textwidth abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra abra-cadabra \stoptext
(The problem appears in the export html/xml file, not in the pdf.)
Not a function of explicit compounds (||) but of hyphenation of compounds. Normal hyphenation does not bring about the problem.
I also note that \setupdirection with every option combination I have tried has no discernible effect on my export output, and can safely be removed from the export mode of my document, so for me this issue disappears. I do not know if this is the general case. -- Rik
participants (2)
-
Hans Hagen
-
Rik Kabel