Re: On XML and $EDITORs (Re: Things that used to be clear (was ...)) "Living Documents") side meeting at IETF105.)

Keith Moore <moore@xxxxxxxxxxxxxxxxxxxx> · Wed, 10 Jul 2019 06:09:00 -0400



    On 7/10/19 12:52 AM, Nico Williams
      wrote:

    
      On Tue, Jul 09, 2019 at 09:01:11PM -0700, Christian Huitema wrote:

      
        On 7/9/2019 1:34 AM, Nico Williams wrote:

        
          XML with webby $EDITOR tooling would do.

        
        Tooling is just one of the problems with XML2RFC. The real issue is
that XML2RFC is completely specific to the IETF. This translate into
training requirements for people who need to actually use that markup
language, absence of easy to use tools because the user pool is too
small to sustain development, and then a reliance on translators
between an easy-to-edit format and the publication. For example, a
team of authors would be using markdown and Github, and using a tool
chain to produce XML2RFC. But if a copy editor suggests updates to the
XML text, these updates cannot easily imported to the original
markdown document, or to the markdown starter for the "bis" project.

      
      Office and LibreOffice use XML too, but users don't see it.  That's what
I meant by "webby $EDITOR tooling" above: a bloody real UI, a browser UI.
    
    While that's probably better than editing raw XML, I'm
      unfavorably impressed with UIs for editing XML (and that includes
      UIs that edit HTML and variants thereof).   A possibly familiar
      example of what I'm talking about: you're editing a document that
      is internally represented by HTML or XML and trying to delete
      white space between two chunks of text that are at different
      levels of hierarchy.   All of a sudden you've "deleted too much" -
      the visual difference between those two chunks, that reflects the
      difference in the XML hierarchy, disappears.   You weren't trying
      to collapse the hierarchy, you were just trying to get rid of
      distracting and meaningless white space.   Or a similar problem -
      you want extra white space, say, between items in a
      bulleted list, and the editor keeps trying to optimize out that
      white space because it sees it as superfluous.  

    
    At first glance one might assume that the problem is the editor
      implementation.   But you really can't fix it in a WYSIWYG editor
      specifically because it hides the underlying representation from
      the user.    And that means that there are circumstances in which
      "delete text at the cursor" is ambiguous, or potentially means
      multiple things, some of which are invisible.   The designer of
      the editor has a choice - does each delete remove something in the
      underlying representation (some of which may be invisible, so it
      looks to the user like the delete key is unreliable), or does each
      delete remove all of the differences in markup between the
      preceding and following chunk of text, or something in between?  
      There's no good answer, especially because the order in which
      those invisible things get deleted from the underlying
      representation really matters and the user can't see the order.  
      (of course it's not only XML-ish representations that have this
      problem, but XML-ish representations exacerbate it).

    
    The fundamental problem is that XML is really a poor
      representation of text.  This is especially true for editing, but
      not just for editing.   Text is not hierarchical.   How do you
      represent in XML a comment on a particular block of text that,
      say, overlaps multiple XML elements but doesn't completely contain
      all of them?   In a document which has been edited by multiple
      users, how do you represent in XML the changes made by each
      user?   I'm not saying that it absolutely cannot be done, but it's
      either going to be ugly or it's going to abandon many of the
      properties that made XML appear to be attractive in the first
      place.
    
    
        I understand why we adopted an XML format 20 years ago. That was
better than NROFF, and there was a hope that the whole publishing
industry would standardize on XML. It did not, and now the IETF has
its very own markup language.

      
    In some sense, nroff really was better.   Probably not better
      overall, but at least nroff usually wouldn't throw up its hands
      and completely refuse to render a document (within an hour of a
      deadline) because you left out or misspelled some directive.   And
      in nroff you didn't have the UI problem.    I'm not arguing for a
      return to nroff.   XML is definitely more powerful in some ways,
      and XSLT is nice.  (I've written tools to convert nroff to other
      representations and it wasn't either easy or fun.)  But we went
      from one obscure and specialized text representation to another,
      and the newer representation is in some ways a poorer reflection
      of the text than the older one.  
    Anyway, if we're really going to try to improve our tools, we
      shouldn't naively assume that XML is the right direction for
      underlying representation.   Again, it can probably be made to
      work, but I suspect only by realizing that we simply can't force
      everything into a hierarchy.   So the XML would not be a natural
      representation for the text, it would only be a contrived
      representation that had to be converted back-and-forth between a
      better internal representation.  (And if you don't define that
      internal representation and the algorithm for conversion, each
      editing tool is going to do it differently, which creates another
      problem - importing the text into a tool, making any edit at all,
      and saving it will change the representation and likely how the
      text is displayed). 

    
    Keith