Re: Last Call: 'The APPLICATION/MBOX Media-Type' to Proposed Standard

"Eric A. Hall" <ehall@xxxxxxxxx> · Tue, 10 Aug 2004 11:14:21 -0400

Thanks for the comments.

If we start from the premise that messages in mbox files are RFC2822
objects, then your conclusions and arguments are appropriate and correct.
However, that premise is demonstrably false, and most of the conclusions
which follow that premise are also false.

Let's be clear about this: mbox files ARE NOT collections of 2822 objects,
but instead are app-specific databases of app-specific message objects. As
such, the right way to handle them is to treat them as opaque databases,
just as if they ~Eudora databases, or ~Outlook databases, or anything
else, and to treat them accordingly.

As a simple example of this principle, mbox files often contain relative
email addresses with no domain qualifier (especially common with users who
exchange messages with other local users, but this isn't limited to that
context). OTOH, section 3.4.1 of RFC2822 unambiguously states that email
addresses (those that use addr-spec) must be qualified. So, starting with
this message from the E2E-Interest archives:

ftp://ftp.isi.edu/end2end/end2end-interest-1985-1987.mail

| Date: Wed, 3 Jun 87 13:18:28 PDT
| From: Bob Braden <braden>
| Message-Id: <8706032018.AA02832@xxxxxxxxxxxxxx>
| To: end2end-interest
| Cc: postel

What should an import agent do when it finds that, and wants to import it
into an ~IMAP folder, or wants to ~remail the message? should it append
the local domain name? should it append a default domain name? should it
reject the message because it ~doesn't conform to 2822 and because no
assumptions are safe? Clearly, the message is not conformant to 2822, and
the data has to be analyzed in the context of the application which
created it, rather than the application-neutral format that we all wish
was being used. From that position, out-of-band negotiation over the
formatting of the database is the only thing that will work.

Addresses are a minor example, and one that a lot of folks would gladly
brush off as irrelevant, but there are a lot more examples, some of which
are much more significant. Since the messages are not 2822 compliant
objects, there is no gurantee (or even any reasonable assumption) that
binary objects have been previously encoded into a safe format. Messages
can easily contain untagged 8-bit characters, bare CR and/or LF, and can
be thousand characters in length, all of which violates other assumptions
about 2822 formatting. 998-character line-lengths would destroy the data.
Automatic EOL conversion (as implied by text/* media-type definitions)
would destroy the data. And so forth.

What if two local users are exchanging big5-encoded messages, but neither
mailer has tagged the messages with the appropriate MIME tags (not
uncommon, and not a requirement, since these are not 2822 compliant, nor
even MIME-compliant). What assumptions should an importer make? The only
safe thing here is out-of-band negotiation ("these are big5").

It would be nice if mbox files only contained 2822 objects. But they
don't. They don't even have to conform to any one set of assumptions, and
can contain messages in different charsets (all of which are untagged), or
which have different relative domains (all of which are unspecified), or
any number of other variances. As such, they have to be treated as opaque
databases, not as collections of well-formed 2822 objects.

I also note that the digest media-type is already specified and is the
appropritate interchange format if you actually do have a collection of
well-formed 2822 objects. But if you have an mbox file, you have to
exchange it as an opaque database, and you have to delineate any internal
assumptions through out-of-band negotiations. The mbox media-type is for
use with tagging and identifying the data being exchanged ("here is an
opaque database of unspecified message objects") only.

Detailed responses to points follow:

On 8/10/2004 7:19 AM, Ian Jackson wrote:

> The IESG writes ("Last Call: 'The APPLICATION/MBOX Media-Type' to Proposed Standard "):
> 
>>The IESG has received a request from an individual submitter to consider the 
>>following document:
>>
>>- 'The APPLICATION/MBOX Media-Type '
>>   <draft-hall-mime-app-mbox-02.txt> as a Proposed Standard
>>
>>The IESG plans to make a decision in the next few weeks, and solicits
>>final comments on this action.  Please send any comments to the
>>iesg@xxxxxxxx or ietf@xxxxxxxx mailing lists by 2004-09-06.
> 
> I have the following comments:
> 
>  * This specification is incomplete.  There are unresolved issues
>    regarding the semantics of the format.

This is not intended to serve as an authoritative reference to the mbox
database format, but is only intended to provide an identifier for the
database-type when it is transferred. Out-of-band negotiation is necessary
in all cases anyway, and I don't really think it's appropriate for the
IETF to define an application-specific database format anyway.

I'd like to see one, and I'd like to see whatever *NIX consortium is
responsible for such things get together and define one.

>  * Since mbox files are text files (assuming that any binary messages
>    in the mailbox are themselves encoded) and can be read sensibly
>    with the naked eye, the content type should be text/* not
>    application/*.  This will also remove ambiguity surrounding line
>    endings.

Automatic EOL conversion destroys message objects. There is no assumption
that messages are conformant with 2822, nor can there be any such assumption.

>  * Since an mbox is actually an aggregate type - a way of encoding a
>    set of RFC822 messages - transfer encodings other than 7bit and
>    8bit should be discouraged.  The spec should probably deprecate
>    them in most cases.

I don't see why. It may be efficient to append an app/mbox database to a
[2822-compliant] email message, and to encode is as QP, or it may be
necessary to encode it as base64 if it contains long lines of raw 8bit data.

>  * The Proposed Standard should either include or refer to a specific
>    mbox format.  The fact that there are variant implementations
>    doesn't mean that the Proposed Standard should hesitate to declare
>    those broken (at least, broken when a file is sent as text/mbox).
>    Those variant implementations are not wholly interoperable anyway,
>    and in order to write software which deals correctly with text/mbox
>    it will be necessary for the spec to say what the format is
>    supposed to be !

Well, this would require that every local application be changed to suit
the needs of the transfer format, which is not reasonable. Instead the
goal here is to define a transfer identifier which says that the data
probably looks like XYZ.

>  * The format specified should be that described in Rahu Dhesi's
>    posting to comp.mail.misc in 1996, <4ivk9s$bok@xxxxxxxxxxxxxxxx>.

The message with that message-ID does not define a format.

>  * If an mbox file contains messages with unencoded binary data, the
>    file is difficult to sensibly process on a machine with non-UN*X
>    line-endings, because of the bare CRs in the binary data.  (Bare
>    LFs are fine and look just like line endings, with From_-escaping
>    and all.)  As far as I can tell there is then no non-lossy
>    representation of the file which allows sensible local processing
>    by non-mbox-specific tools.  This issue should be resolved (or at
>    least acknowledged).

-- 
Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf