On 8/21/23 13:16, Phillip Hallam-Baker wrote:
I am also leaning toward recommending that IETF use a subset of HTML. For IETF's purposes I don't think it's that tricky to define the subject used, but there's still a danger of a slippery slope: "Hey, HTML already supports the <FROB> tag so why can't we use it?" But it would have the virtue that pretty much everyone's email reader would already present such messages correctly.It has occurred to me that one way to solve the issue we are having in Everything, namely a format that is essentially a subset of HTML is going to be easiest to render as HTML by using markdown as the document format.
(and yet, many of those MUAs would corrupt such HTML when generating replies - which means that the list processor would absolutely have to "clean up" the HTML before forwarding it to the list recipients.)
As for input to the list, we'd have to support:
- text/plain (with or without format=flowed)
- text/html (including lots of variations produced by various MUAs, now and in the past and future also)
- multipart/alternative (text/html; text/plain) - probably produce a different multipart/alternative with both parts derived from the text/html part of the subject message - the output html being a simplified version of the input, the output text/plain derived from the simplified html. But the real point here is that it has to be dealt with explicitly.
- and perhaps also strip out some of the input
And probably need to support markdown or something similar as a variant of text/plain, if for no other reason than to give senders of text/plain a non-ambiguous way of including ASCII art in their messages. (yes you can use heuristics to try to extract ASCII art from text/plain, but it seems tricky to get this right. I'd rather use markdown than heuristics.
And perhaps we'd need to accept markdown embedded in text/html also (since many MUAs these days will generate text/html without the sender intending it)
But I think it's doable. The thing that bugs me most about this
is that W3C HTML is a moving target, and it's moving in a
direction that is less and less amenable to this kind of
processing over time (or requires that such processing be more and
more sophisticated over time).
What we can't really expect is that we can form a WG to specify
this, that will debate which parts of HTML to allow, and then
produce an RFC specifying acceptable HTML for the kinds of
discussion that IETF has. Instead I think we need a research
group to conduct experiments with some of these mechanisms in the
context of one or more technical discussions, and report on their
experiences and make recommendations.
Keith
p.s. The list can't simply strip out the text/html portion of a
multipart/alternative, because the text/plain portion is not
always a plain text representation of the HTML. You have to
convert the text/html to something simpler.