Hi! On 6/21/24 19:27, Junio C Hamano wrote:
Hmph, there is this piece of code before the block: if ($c !~ /.+@.+|<.+>/) { printf("(body) Ignoring %s from line '%s'\n", $what, $_) unless $quiet; next; } This is to reject strings that do not even look like an e-mail address, but we called sanitize_address() call on $c way before this check. I wonder if we should move this block way up, even before the call to santize_address()? That was a relatively unrelated tangent.
Hm, maybe. Should I add this to the patch as well?
In the same function, there is this snippet about Cc: (I didn't check if the same issue is shared with other header fields):
Yes, I've seen that. However, the mbox headers are more likely to conform to RFC 2047, or at least `git format-patch` generates correct mbox headers. (Before sending the original patch, I was playing around with purposefully broken mbox files, and at least Thunderbird seems to correctly parse these non-conforming mbox headers in _most_ cases, but that's probably just extra caution on Mozilla's side).
It looks to me that there are many other places that we try to be as faithful to the original as possible. In the same block as the one that handles "Cc:" I quoted above, an address on "From:" is also sent intact into @cc and addresses on "To:" are handled the same way. The patch under discussion singles out the addresses on the trailers in the message body and treat them differently from others, which I am not sure is what we want to do.
I think it is a reasonable assumption that the mbox headers will be conforming, whereas the message body is just freeform text and no such guarantees exist. But if we want to be paranoid about it, we could try and sanitize everything.
Bence