Re: HTML for email

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/1/21 9:22 AM, Phillip Hallam-Baker wrote:

Yes HTML is a disaster for email. But so is plaintext wrapped at 66 characters by the server because people didn't know better.

Ok, first please understand that I'm not blaming anyone.   I'm also not making any proposal, at least not yet, and I am not currently sure about the best way forward.   I'm just making some observations based on long-term experience with email collaboration in IETF.

In some sense, plain text works better for IETF's needs than HTML, and maybe plain text works better for the needs of anyone doing technical discussion over email.  There is some virtue in simplicity.    One of the virtues of plain text email is that there's not (much of) a layer of interpretation between the text in the message and what the recipient sees.   (Of course there is a layer - the character encoding scheme - but not much more than that.)  

That's not to say that there are no technical problems with plain text email or that it can't be improved.   Bbut the biggest problem with improving either is one of deployment.


The reasons HTML is a disaster are

1) There is no standard for HTML in email.
While true, it's not immediately clear that HTML is readily extensible to fix these problems, because it needs to be defined in such a way that a variety of MUA implementations produce consistent behavior when multiple parties make successive edits/replies to a message, when portions of multiple messages are quoted in a message, and so on.    You might, for instance, need to specify the representation of text copied from one email and pasted to another.

One way to view the problem of HTML in email, is that in email you need to have the ability of many different parties to edit the document over and over, by different implementations, without producing a corrupted mess.   HTML is not designed for that.   But plain text email has a similar problem, and so does every "word processor" format I've seen that's more complex than, say, WordStar.   Anyone who has been around IETF for awhile has seen the effect of multiple layers of line-wrapping and ">" (or similar marks) added at the beginning of lines.  

(Actually the problem is even worse, because some MUAs used in a conversation will treat the quoted parts as plain text and others will try to make them into HTML.   So you get multiple incompatible layers.)

Still, humans can manually "clean up" text that has been subject to that kind of repeated alteration.  But cleaning up HTML that has suffered similar damage generally doesn't happen, partly because there's a layer between the actual HTML and the user interface that keeps users from doing exactly what they need.  

(I'm not suggesting that we should instead edit the raw HTML in messages when composing replies.   In addition to requiring participants to be HTML experts, the HTML generated by most MUAs is far too messy for that.)

2) HTML has been turned into a presentation format.

I realize this is heresy, but a presentation format is what people actually need in the vast majority of cases.  

Semantic markup has its place.  When you're writing a book or maybe even a long article, you need to focus on content, not layout.   The presentation needs to be fine-tuned after the content is written or mostly written, and often by different people than those who wrote the content.   (and sometimes the content is tweaked for the sake of presentation).   Semantic markup makes good sense for that kind of application.

But for discussion, a semantic markup layer just gets in the way.   That's also true for most web pages.   Web developers need to be able to dictate what the content looks like on the screen (while still being responsive to different kinds of displays), and they're forced to deal with a layer that tries to second-guess them.

3) Email messages used annotations for a decade before HTML which doesn't support them

Right.   And it turns out that we need annotations in email.

4) The SMTP email infrastructure does not provide a viable means of knowing what formats are accepted by a recipient so there is no way to fix this.

I'm not sure that would solve the problem at least for IETF's case, or for any use case that involves large numbers of potential participants.   When you compose a message to send to a mailing list, should your user agent poll the capability of every recipient to find out what kind of message format each can accept?   Should it send out different formats to different recipients?   Should it try to identify a common subset so it only has to generate one message and so that recipients' experiences will be more consistent?   What if you have an email conversation between a small number of people, a new recipient is added, and everyone's messages change format because the new recipient's capabilities don't support the common subset of the other recipients?   What about the very common case when a single recipient has multiple user agents with different capabilities?

In other words, be careful what you wish for.   There's a lot of value in having a common format and minimal set of capabilities that everyone supports.


One painful side effect of 1 and 2 is that messages come with embedded font size specifiers which is beyond stupid. The sender has no idea what device I am reading something on. But Gmail will happily chose font size settings that are frequently stupid. I have no control over that as a user.

But the last point is the most important because the difficulty of fixing the SMTP infrastructure has become greater than the difficulty of replacing it with something fit for purpose.
SMTP has turned out to be surprisingly (to me at least) fit for purpose.   It was designed in an era when you couldn't expect complete and full-time connectivity between senders and receivers, and also couldn't expect everyone to have access to the same network (e.g. ARPAnet vs. X.25), so it used store-and-forward.   But it turned out that store-and-forward was useful even in environments that could provide complete connectivity.   And later on it turned out to be useful for getting mail through firewalls.   In many environments store-and-forward is used to implement spam filters, virus filters, etc., to hide internal enterprise network infrastructure from outside viewers, and several other purposes.   And store-and-forward helps make email more reliable, because it separates the problem of persistent delivery from the sending user agent's responsibility.

I do think some sort of recipient capability discovery could be useful for most messages that are sent to relatively few recipients (and actually had a proposal for this a few years ago, specifically to discover recipients' public keys), but probably not for IETF-style email discussions.   And implementing capability discovery for email means basically having two kinds of services that need to stay in sync, which creates additional risks.

Keith



[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux