XML, dealing with whitespace
Hi all,
I have sources that look like this:
%%%%%%%%%%%%%%%%%%%%%
<?xml version="1.0" encoding="UTF-8"?>
<article>
<p>Bla Bla Bla</p>
<p>
<underline>
<italic>Bla</italic>
</underline>, Bla Bla.</p>
</article>
%%%%%%%%%%%%%%%%%%%%%
Typesetting this with context gives me a spurious space after the underlined Bla in italics. Complete MWE :
%%%%%%%%%%%%%%%%%%%%%
\startxmlsetups xml:test
\xmlsetsetup{#1}{*}{-}
\xmlsetsetup{#1}{article|p|italic|underline}{xml:*}
\stopxmlsetups
\xmlregistersetup{xml:test}
\startxmlsetups xml:article
\starttext
\xmlflush{#1}
\stoptext
\stopxmlsetups
\startxmlsetups xml:p
\xmlflush{#1}\par
\stopxmlsetups
\startxmlsetups xml:italic
\emph{\xmlflush{#1}}
\stopxmlsetups
\startxmlsetups xml:underline
\underbar{\xmlflush{#1}}
\stopxmlsetups
\startbuffer[test]
<?xml version="1.0" encoding="UTF-8"?>
<article>
<p>Bla Bla Bla</p>
<p>
<underline>
<italic>Bla</italic>
</underline>, Bla Bla.</p>
</article>
\stopbuffer
\xmlprocessbuffer{test}{test}{}
%%%%%%%%%%%%%%%%%%%%%
How can I get rid off spurious leading and trailing whitespace. I've found \xmlstrip and \xmlstripped, but I don't really understand how they work. I've also found out about
\ignorespaces\xmlflush{#1}\removeunwantedspaces
but this has then to be added to every definition, which would be a bit tedious...
There have a been a couple of similar questions by Hans van der Meer about a decade ago, but I couldn't find the answer.
Then, \xmlstripanywhere is also mentioned in xml-mkiv.pdf, but it's not explained. I found one example in the sources (https://source.contextgarden.net/tex/context/modules/mkiv/x-html.mkiv?search...), but what does that do? Is that sort of need for \xmlstrip and friends to work?
So, what would be the best way to deal with that situation? (More details below, perhaps there's an easier solution outside of context, because the problem is actually caused by xslt...)
Best,
Denis
P.S. Background:
I convert docx files with pandoc to jats xml. Pandoc does quite a decent job, but I need to tweak a few things with xslt. The actual transformation that I need works ok, but the transformation also causes other problems.
This is the original markdown file :
%%%%%%%%%%%%%%%%%%%%%%%
Bla Bla Bla
[*Bla*]{.underline} Bla Bla.
%%%%%%%%%%%%%%%%%%%%%%%
Pandoc produces a jats xml file that looks like this (simplified, empty nodes deleted) :
%%%%%%%%%%%%%%%%%%%%%%%
<?xml version="1.0" encoding="utf-8" ?>
<article>
<body>
<p>Bla Bla Bla</p>
<p><underline><italic>Bla</italic></underline>, Bla Bla.</p>
</body>
</article>
%%%%%%%%%%%%%%%%%%%%%%%
I use this xsl for tweaking pandoc's output
%%%%%%%%%%%%%%%%%%%%%%%
<?xml version="1.0" encoding="UTF-8"?>
Denis Maier via ntg-context schrieb am 15.01.2022 um 13:04:
Hi all,
I have sources that look like this:
%%%%%%%%%%%%%%%%%%%%%
<?xml version="1.0" encoding="UTF-8"?>
<article>
<p>Bla Bla Bla</p>
<p>
<underline>
<italic>Bla</italic>
</underline>, Bla Bla.</p>
</article>
%%%%%%%%%%%%%%%%%%%%%
Typesetting this with context gives me a spurious space after the underlined Bla in italics.
There is no spurious space, the line break is just converted to a space and I see no reason why this shouldn't happen. To remove space before or after certain parts of text within a paragraph you can use the \removeunwantedspace and \ignorespaces commands. %%%% begin example \starttexdefinition RemovePreceding #1 \removeunwantedspaces #1 \stoptexdefinition \starttexdefinition RemoveFollowing #1 #1 \ignorespaces \stoptexdefinition \starttext Bla \RemovePreceding{Bla} Bla Bla \RemoveFollowing{Bla} Bla \stoptext %%%% end example When only following spaces are a problem a better alternative to \ignorespace is \autoinsertnextspace which checks the following token which ensures there is space when the next character is punctuation. %%%% begin example \starttexdefinition Italic #1 \emphasized{#1} \autoinsertnextspace \stoptexdefinition \starttexdefinition Underbar #1 \underbar{#1} \stoptexdefinition \starttext Bla Bla Bla \Underbar{\Italic{Bla} , Bla Bla.} \stoptext %%%% end example Wolfgang
Hi Wolfgang,
Von: Wolfgang Schuster
participants (2)
-
denis.maier@unibe.ch
-
Wolfgang Schuster