Re: [Tools-discuss] formatting follies, was The IETF's email

Phillip Hallam-Baker <phill@xxxxxxxxxxxxxxx> · Mon, 21 Aug 2023 15:06:35 -0400

On Mon, Aug 21, 2023 at 1:44 PM Keith Moore <moore@xxxxxxxxxxxxxxxxxxxx> wrote:

    On 8/21/23 13:16, Phillip Hallam-Baker wrote:

      It has occurred
        to me that one way to solve the issue we are having in
        Everything, namely a format that is essentially a subset of HTML
        is going to be easiest to render as HTML by using markdown as
        the document format.

    I am also leaning toward recommending that IETF use a subset of
    HTML.  For IETF's purposes I don't think it's that tricky to define
    the subject used, but there's still a danger of a slippery slope:
    "Hey, HTML already supports the <FROB> tag so why can't we use
    it?"   But it would have the virtue that pretty much everyone's
    email reader would already present such messages correctly.

Hence the attraction of an essentially pointless formatting change.

It is like when we set up the clean room for the wire chamber with a huge great big hump in the room that you had to climb over to get from the dirty side to the clean.

    As for input to the list, we'd have to support:

    - text/plain (with or without format=flowed)

    - text/html (including lots of variations produced by various MUAs,
    now and in the past and future also)

    - multipart/alternative (text/html; text/plain) - probably produce a
    different multipart/alternative with both parts derived from the
    text/html part of the subject message - the output html being a
    simplified version of the input, the output text/plain derived from
    the simplified html.   But the real point here is that it has to be
    dealt with explicitly.

    - and perhaps also strip out some of the input

    And probably need to support markdown or something similar as a
    variant of text/plain, if for no other reason than to give senders
    of text/plain a non-ambiguous way of including ASCII art in their
    messages.  (yes you can use heuristics to try to extract ASCII art
    from text/plain, but it seems tricky to get this right.  I'd rather
    use markdown than heuristics.
Another argument for markdown.

    And perhaps we'd need to accept markdown embedded in text/html also
    (since many MUAs these days will generate text/html without the
    sender intending it)

    But I think it's doable.   The thing that bugs me most about this
      is  that W3C HTML is a moving target, and it's moving in a
      direction that is less and less amenable to this kind of
      processing over time (or requires that such processing be more and
      more sophisticated over time).
It is not just a moving target, it is a target moving in a different direction.

Back when HTTP/2 discussion started, I tried to engage and carve out a place for Web Services. And very quickly realized that we don't need Web Services support in HTTP/2, we want them completely separated and a custom protocol designed for Web Services in which the Well Known service tag is pretty much the only header.

      What we can't really expect is that we can form a WG to specify
      this, that will debate which parts of HTML to allow, and then
      produce an RFC specifying acceptable HTML for the kinds of
      discussion that IETF has.   Instead I think we need a research
      group to conduct experiments with some of these mechanisms in the
      context of one or more technical discussions, and report on their
      experiences and make recommendations.
+1

The one issue I would have there is that there is a risk of being over restrictive in the content, limiting it to just the types of discussion people are familiar with.

I wrote the following back when I was at CERN, it is based on the approach in TeX and Don Knuth responded (by email!) saying it looked valid. We did not get anything of the sort supported in Web browsers for another 15 years because 'it wasn't important'.

https://www.w3.org/MarkUp/html3/maths.html

Of course, this is also a slippery slope, why not chemical formulas as well? why not...

The justification I would give for doing math and just math is that 

1) it is the only markup that is typically used inline in text. 
2) if you can't express math in TeX, it isn't math notation any more.

For example, my thesis has a lot of very custom math markup, CSP, Z, and some custom notations. They are all handled by the TeX processor which has a very small number of very powerful rules. And in fact, it does support chemical notations.

Now a separate issue is how people would type this stuff in at the keyboard. And I rather suspect that a lot of the demand for 'plaintext' is really a demand to be able to edit messages from the keyboard.

So given the example https://developer.mozilla.org/en-US/docs/Web/MathML/Element/math,

I suspect most of us would prefer to type something more like:

/sum{n=1}{+/infinity}1/n^2

Only in an XMLish idiom.

The way it is structured in LaTeX is that you can have multiple math notations that map to the fundamental presentation widgets.

At this point, any math markup is going to have to be congruent to MathML in order to be viable. So it has to be possible to translate from the messaging format to MathML. It is also desirable to be able to round trip but this may be lossy.

The big challenge here would be adapting markdown so that quoting nested threads became reliable. The way I see it, things go pear shaped because there is an interaction between line wrapping and quoting.

So the thread

Fred said
|That is nuts, I am going to yammer on and on and on and on and on and on and on and on and on and on and on

gets wrapped as

Fred said
|That is nuts, I am going to yammer on and on and on 
and on and on and on and on and on and on and on 
and on

And the only way to recover the threading is to apply heuristics.

So one hard and fast rule must be that clients MUST NOT wrap lines. A new line is always a new paragraph.

When displaying proportional font text, new paragraphs always receive a separator space (or not) as determined by the user preference, preformatted text does not.

Oh and as for Markdown dialect, I think GitHub has essentially closed that debate.