On Tue, May 15, 2007 at 11:34:02PM -0700, Junio C Hamano wrote: > I think you are trying to figure out how vger adds/munges the > headers, and the above is not very useful for people but > yourself unless you explicitly say what headers you gave on your > end in the body of the message, is it? Yes, I'm sorry that the message itself looked a bit vague. It was actually about the 4th or 5th such message I sent, as the list filter kept blocking the previous ones, so with each iteration I made the message shorter and shorter to try to remove any offending text. So Karl and Bruce actually received several explanatory messages that everyone else didn't, and I really only expected them to be replying. > Judging from the list responses, I am guessing the situation is > like this. Does that match your understanding? Yes, this is close. > outgoing: > body in utf-8 > Content-type: text/plain; charset=utf-8 > no MIME-Version: header > > vger relayed to recipients: > body untouched > Content-type: text/plain; charset=iso-8859-1 > MIME-Version: 1.0 There is also a "Content-Transfer-Encoding: 8bit" that gets switched to quoted-printable (and the body is actually encoded as QP). However, the change of charset is the problem. > I am not sure what exactly you meant by with/without "the right > mime header", but the above is based on my guess that you meant > only MIME-VERSION header. Yes, the two messages differed _only_ in the presence of a MIME-Version header. So now that I have the data, let me explain the sequence of events in the bug, which should hopefully explain what everyone has seen. 1. Bruce generates a message containing utf8 characters in the body and the following headers: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit and _no_ MIME-Version header. This is produced by git-format-patch on a commit with non-ascii in the body. The message is sent to vger, and cc'd to some recipients, and we move to step 2. 2a. The cc'd recipients receive one copy of the message intact; none of the mailservers along the route do any munging. This message is done. 2b. vger sends the message to each list member in turn. What the user sees depends on the mail route. We move to step 3. 3a. If the next hop from vger advertises 8BITMIME in the SMTP session, then vger submits the message intact. This is the case for me, so I see all messages intact (and is why I needed responses from others -- specifically, I knew Karl and Bruce were seeing the problem). This message is done. 3b. If the next hop does not advertise 8BITMIME, vger must convert the message to a 7bit encoding (it chooses quoted-printable). Continue to step 4. 4a. If the message has valid MIME headers, then vger can simply encode, re-writing the content-transfer-encoding to quoted-printable and encoding the body. vger considers valid mime headers to be a MIME-version header and a content-type header. This is the case for the second message I set, which appears correctly to all recipients. 4b. If the message doesn't have valid MIME headers, then vger adds the headers. Without a MIME-Version header, it ignores the content-type and guesses at a suitable one, using text/plain with some totally arbitrary local charset (in this case "iso-8859-1"). This message has now been incorrectly munged (claims latin1 charset, but has utf8 characters). vger puts an explanation into the X-Warning headers of the munged message (the only unexplained thing that I had to test is that MIME-Version is critical to vger believing the current content-type). So recipients see the bug IFF the original has utf8 characters AND the original lacks a MIME-Version header AND their mailserver doesn't claim 8BITMIME Interestingly, rfc1428 claims that in this case vger should actually set the charset to "unknown-8bit": If no information about the character set in use is available, the gateway should upgrade the content by using the character set "unknown-8bit". The unknown-8bit value of the charset parameter indicates only that no reliable information about the character set(s) used in the message was available. Though that really just pushes the problem to the recipients MUA, and I have no idea what the handling of "unknown-8bit" is like there. -Peff - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html