Re: Make "git am" properly unescape lines matching ">>*From "

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 08 Jun 2010 13:50:08 -0700, "H. Peter Anvin" <hpa@xxxxxxxxx> wrote:
> On 06/08/2010 12:57 PM, Carl Worth wrote:
> > When I did that, I was careful to escape lines from the bodies of email
> > messages that begin with zero or more '>' characters followed
> > immediately by "From " (From_ lines) by adding an initial '>'. [2]
...
> The problem with that is that it is not universally applied.

Right. And since I can't fix this universe, I'd like to at least start
with getting notmuch and git to use the same thing. Currently, git is
using a non-standard not-quite-safe mbox format while notmuch doesn't
yet emit anything like mbox. So we have a nice opportunity to fix these
two projects to at least work well together, (if we can agree on a
format).

> As far as I can tell, the Content-Length: is the most reliably handled
> format and probably is what we should use.  This is the "mboxcl2" format
> in your list.[*]  Unfortunately "mboxcl2" and "mboxrd" cannot be
> distinguished from each other by inspection, which is a major defect of
> both formats.

What do you mean by "most reliably handled format"?

Of the four mbox formats listed on the page I cited[*], "mboxo" and
"mboxcl" are easy to discard as they both irreversibly corrupt messages.

That leaves both "mboxrd" and "mboxcl2" as candidates. Either of these
formats is reliable if both the reader and writer use the same
format. When the reader and writer don't agree, then there are problems
as follows ("W:" indicates writing, "R:" indicates reading expecting a
particular format):

W:mboxrd  then R:mboxcl2 -> Reader may corrupt by failing to remove '>'
			    Reader must give up/guess without CL headers
			    Guessing is at least unlikely to mis-split messages

W:mboxcl2 then R:mboxrd  -> Reader may corrupt by erroneously removing '>'
			    Reader may mis-split messages on "From " in content

I preferred to implement mboxrd over mboxcl2 for several reasons:

  1. The mboxrd writer implementation is much simpler. This format
     affords a simple streaming implementation where mboxcl2 requires
     knowing the length of the message in advance.

  2. The mboxrd format is robust in the face of file changes that
     invalidate the Content-Length headers, (for example, a person
     can hand-edit an mboxrd file without invalidating it, but cannot do
     the same with an mboxcl2 file).

  3. The mboxrd reader implementation is much simpler. An mboxcl2 reader
     necessarily has special-cases that an mboxrd implementation does
     not. What to do if there is no Content-Length header? What to do if
     the Content-Length header appears wrong? etc. Recovery code for
     these cases might well be to fallback to something like an mboxrd
     implementation, which demonstrates the increased complexity here.

As can be seen in my patch, doing an mboxrd reader in git-mailsplit was
quite simple. An mboxcl2 reader would be quite a bit more complicated,
but with no actual benefit in reliability, (assuming that the reader
matches the writer).

> The statement that "the entire "mbox" family of mailbox formats is
> gradually becoming irrelevant, and of only historical interest" is also
> pretty silly -- mbox is still the preferred format for moving groups of
> email from MUA to MUA, even if it is no longer used for active live
> spool storage.  But, of course, you knew that already.

Indeed. Though I was surprised to recently find that postfix does still
by default deliver to /var/mail/$user in "mboxo" format (ugh).

-Carl

[*] http://homepage.ntlworld.com/jonathan.deboynepollard/FGA/mail-mbox-formats.html

-- 
carl.d.worth@xxxxxxxxx

Attachment: pgpmmTXwspFKn.pgp
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]