On Thu, 12 Aug 2004 17:18:19 EDT, Tony Hansen said: > The information about the mbox format being anecdotally defined is > incorrect. The mbox format has traditionally been documented in the > binmail(1) or mail.local(8) man pages (BSD UNIX derivatives) or mail(1) > man page (UNIX System 3/5/III/V derivatives). There have been several > variants of the mbox format in use by those different systems. The most > complete description of an mbox format can be seen in the man page from > any UNIX System Vr4 derived system, such as Solaris. Umm.. Tony? I hate to say it, but if there have been several variants used in the wild, and the man pages for said variants document different formats, that's awfully close to "anecdotally defined" when you're doing a standard. For example, a Solaris 8 box across the hall says in 'man mail.local': Each delivered mail message in the mailbox is preceded by a "Unix From line" with the following format: From sender_address time_stamp The sender_address is extracted from the SMTP envelope address (the envelope address is specified with the -f option). A trailing blank line is also added to the end of each mes- sage. Hmm. Nothing about whether the sender_address is, or should be, <bracketed>. Nothing about the format of the time_stamp. Nothing about '>From ' stuffing (and yes, I've seen systems that don't do it at all, and systems that only >-stuff if the From line matched a regexp for what *they* think the entire 'From ' line looks like(*)). The Sendmail 8.13.1 mail.local does say >-stuffing happens for lines that "which could be mistaken for a ``From '' delimiter line", and the code actually checks for exactly 5 chars... Any doubts that this whole mess is at best anecdotally defined can be dispelled by mentioning "Content-Length:" (interestingly enough, not even mentioned in the Solaris or Sendmail man pages, although the Sendmail source tree does mention that building on Solaris 2.3 or later will turn it on. Of interest mostly because the Content-Length: is so easily broken by later >-stuffing/unstuffing or other similar conversion... (*) time_stamp. Argh. Fought with this during a data/machine migration. Write code that will accept a 26 byte ctime format: 'Fri Sep 13 00:00:00 1986\n\0'. Works fine once you realize that some systems just used 'From envelop_address' without a timestamp. Then I get handed this: 'Fri Aug 13 20:21:32 EDT 2004'. Fix that, and find some joker running in a French locale: 'vendredi, 13 août 2004, 20:22:01 EDT'. And yes, his b0rked software only >-stuffed 'From ' lines that regexp-matched the *French* variant. Took me *quite* some time to twig into THAT one...
Attachment:
pgpQD3YMg3LFi.pgp
Description: PGP signature
_______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf