Thanks for the comments. If we start from the premise that messages in mbox files are RFC2822 objects, then your conclusions and arguments are appropriate and correct. However, that premise is demonstrably false, and most of the conclusions which follow that premise are also false. Let's be clear about this: mbox files ARE NOT collections of 2822 objects, but instead are app-specific databases of app-specific message objects. As such, the right way to handle them is to treat them as opaque databases, just as if they ~Eudora databases, or ~Outlook databases, or anything else, and to treat them accordingly. As a simple example of this principle, mbox files often contain relative email addresses with no domain qualifier (especially common with users who exchange messages with other local users, but this isn't limited to that context). OTOH, section 3.4.1 of RFC2822 unambiguously states that email addresses (those that use addr-spec) must be qualified. So, starting with this message from the E2E-Interest archives: ftp://ftp.isi.edu/end2end/end2end-interest-1985-1987.mail | Date: Wed, 3 Jun 87 13:18:28 PDT | From: Bob Braden <braden> | Message-Id: <8706032018.AA02832@xxxxxxxxxxxxxx> | To: end2end-interest | Cc: postel What should an import agent do when it finds that, and wants to import it into an ~IMAP folder, or wants to ~remail the message? should it append the local domain name? should it append a default domain name? should it reject the message because it ~doesn't conform to 2822 and because no assumptions are safe? Clearly, the message is not conformant to 2822, and the data has to be analyzed in the context of the application which created it, rather than the application-neutral format that we all wish was being used. From that position, out-of-band negotiation over the formatting of the database is the only thing that will work. Addresses are a minor example, and one that a lot of folks would gladly brush off as irrelevant, but there are a lot more examples, some of which are much more significant. Since the messages are not 2822 compliant objects, there is no gurantee (or even any reasonable assumption) that binary objects have been previously encoded into a safe format. Messages can easily contain untagged 8-bit characters, bare CR and/or LF, and can be thousand characters in length, all of which violates other assumptions about 2822 formatting. 998-character line-lengths would destroy the data. Automatic EOL conversion (as implied by text/* media-type definitions) would destroy the data. And so forth. What if two local users are exchanging big5-encoded messages, but neither mailer has tagged the messages with the appropriate MIME tags (not uncommon, and not a requirement, since these are not 2822 compliant, nor even MIME-compliant). What assumptions should an importer make? The only safe thing here is out-of-band negotiation ("these are big5"). It would be nice if mbox files only contained 2822 objects. But they don't. They don't even have to conform to any one set of assumptions, and can contain messages in different charsets (all of which are untagged), or which have different relative domains (all of which are unspecified), or any number of other variances. As such, they have to be treated as opaque databases, not as collections of well-formed 2822 objects. I also note that the digest media-type is already specified and is the appropritate interchange format if you actually do have a collection of well-formed 2822 objects. But if you have an mbox file, you have to exchange it as an opaque database, and you have to delineate any internal assumptions through out-of-band negotiations. The mbox media-type is for use with tagging and identifying the data being exchanged ("here is an opaque database of unspecified message objects") only. Detailed responses to points follow: On 8/10/2004 7:19 AM, Ian Jackson wrote: > The IESG writes ("Last Call: 'The APPLICATION/MBOX Media-Type' to Proposed Standard "): > >>The IESG has received a request from an individual submitter to consider the >>following document: >> >>- 'The APPLICATION/MBOX Media-Type ' >> <draft-hall-mime-app-mbox-02.txt> as a Proposed Standard >> >>The IESG plans to make a decision in the next few weeks, and solicits >>final comments on this action. Please send any comments to the >>iesg@xxxxxxxx or ietf@xxxxxxxx mailing lists by 2004-09-06. > > I have the following comments: > > * This specification is incomplete. There are unresolved issues > regarding the semantics of the format. This is not intended to serve as an authoritative reference to the mbox database format, but is only intended to provide an identifier for the database-type when it is transferred. Out-of-band negotiation is necessary in all cases anyway, and I don't really think it's appropriate for the IETF to define an application-specific database format anyway. I'd like to see one, and I'd like to see whatever *NIX consortium is responsible for such things get together and define one. > * Since mbox files are text files (assuming that any binary messages > in the mailbox are themselves encoded) and can be read sensibly > with the naked eye, the content type should be text/* not > application/*. This will also remove ambiguity surrounding line > endings. Automatic EOL conversion destroys message objects. There is no assumption that messages are conformant with 2822, nor can there be any such assumption. > * Since an mbox is actually an aggregate type - a way of encoding a > set of RFC822 messages - transfer encodings other than 7bit and > 8bit should be discouraged. The spec should probably deprecate > them in most cases. I don't see why. It may be efficient to append an app/mbox database to a [2822-compliant] email message, and to encode is as QP, or it may be necessary to encode it as base64 if it contains long lines of raw 8bit data. > * The Proposed Standard should either include or refer to a specific > mbox format. The fact that there are variant implementations > doesn't mean that the Proposed Standard should hesitate to declare > those broken (at least, broken when a file is sent as text/mbox). > Those variant implementations are not wholly interoperable anyway, > and in order to write software which deals correctly with text/mbox > it will be necessary for the spec to say what the format is > supposed to be ! Well, this would require that every local application be changed to suit the needs of the transfer format, which is not reasonable. Instead the goal here is to define a transfer identifier which says that the data probably looks like XYZ. > * The format specified should be that described in Rahu Dhesi's > posting to comp.mail.misc in 1996, <4ivk9s$bok@xxxxxxxxxxxxxxxx>. The message with that message-ID does not define a format. > * If an mbox file contains messages with unencoded binary data, the > file is difficult to sensibly process on a machine with non-UN*X > line-endings, because of the bare CRs in the binary data. (Bare > LFs are fine and look just like line endings, with From_-escaping > and all.) As far as I can tell there is then no non-lossy > representation of the file which allows sensible local processing > by non-mbox-specific tools. This issue should be resolved (or at > least acknowledged). -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/ _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf