On Wed, Sep 14, 2016 at 12:30:06PM -0700, Junio C Hamano wrote: > Another small thing I am not sure about is if the \ quoting can hide > an embedded newline in the author name. Would we end up turning > > From: "Jeff \ > King" <peff@xxxxxxxx> > > or somesuch into > > Author: Jeff > King > Email: peff@xxxxxxxx > > ;-) Heh, yeah. That is another reason to clean up and sanitize as much as possible before stuffing it into another text format that will be parsed. > So let's roll the \" -> " into mailinfo. > > I am not sure if we also should remove the surrounding "", i.e. we > currently do not turn this > > From: "Jeff King" <peff@xxxxxxxx> > > into this: > > Author: Jeff King > Email: peff@xxxxxxxx > > I think we probably should, and remove the one that does so from the > reader. I think you have to, or else you cannot tell the difference between surrounding quotes that need to be stripped, and ones that were backslash-escaped. Like: From: "Jeff King" <peff@xxxxxxxx> From: \"Jeff King\" <peff@xxxxxxxx> which would both become: Author: "Jeff King" Email: peff@xxxxxxxx though I am not sure the latter one is actually valid; you might need to be inside syntactic quotes in order to include backslashed quotes. I haven't read rfc2822 carefully recently enough to know. Anyway, I think that: From: One "Two \"Three\" Four" Five may also be valid. So the quote-stripping in the reader is not just "at the outside", but may need to handle interior syntactic quotes, too. So it really makes sense for me to clean and sanitize as much as possible in one step, and then make the parser of mailinfo as dumb as possible. -Peff