Re: [Tools-discuss] formatting follies, was The IETF's email

Keith Moore <moore@xxxxxxxxxxxxxxxxxxxx> · Mon, 21 Aug 2023 13:44:48 -0400



    On 8/21/23 13:16, Phillip Hallam-Baker wrote:

    
      It has occurred
        to me that one way to solve the issue we are having in
        Everything, namely a format that is essentially a subset of HTML
        is going to be easiest to render as HTML by using markdown as
        the document format.
    
    I am also leaning toward recommending that IETF use a subset of
    HTML.  For IETF's purposes I don't think it's that tricky to define
    the subject used, but there's still a danger of a slippery slope:
    "Hey, HTML already supports the <FROB> tag so why can't we use
    it?"   But it would have the virtue that pretty much everyone's
    email reader would already present such messages correctly.

    
    (and yet, many of those MUAs would corrupt such HTML when generating
    replies - which means that the list processor would absolutely have
    to "clean up" the HTML before forwarding it to the list recipients.)

    
    As for input to the list, we'd have to support:

    
    - text/plain (with or without format=flowed)

    - text/html (including lots of variations produced by various MUAs,
    now and in the past and future also)

    - multipart/alternative (text/html; text/plain) - probably produce a
    different multipart/alternative with both parts derived from the
    text/html part of the subject message - the output html being a
    simplified version of the input, the output text/plain derived from
    the simplified html.   But the real point here is that it has to be
    dealt with explicitly.

    - and perhaps also strip out some of the input

    
    And probably need to support markdown or something similar as a
    variant of text/plain, if for no other reason than to give senders
    of text/plain a non-ambiguous way of including ASCII art in their
    messages.  (yes you can use heuristics to try to extract ASCII art
    from text/plain, but it seems tricky to get this right.  I'd rather
    use markdown than heuristics.

    
    And perhaps we'd need to accept markdown embedded in text/html also
    (since many MUAs these days will generate text/html without the
    sender intending it)

    But I think it's doable.   The thing that bugs me most about this
      is  that W3C HTML is a moving target, and it's moving in a
      direction that is less and less amenable to this kind of
      processing over time (or requires that such processing be more and
      more sophisticated over time).

      
      What we can't really expect is that we can form a WG to specify
      this, that will debate which parts of HTML to allow, and then
      produce an RFC specifying acceptable HTML for the kinds of
      discussion that IETF has.   Instead I think we need a research
      group to conduct experiments with some of these mechanisms in the
      context of one or more technical discussions, and report on their
      experiences and make recommendations.
    Keith
    p.s.   The list can't simply strip out the text/html portion of a
      multipart/alternative, because the text/plain portion is not
      always a plain text representation of the HTML.   You have to
      convert the text/html to something simpler.