Csókás Bence <csokas.bence@xxxxxxxxx> writes: > On 6/21/24 19:27, Junio C Hamano wrote: >> Hmph, there is this piece of code before the block: >> if ($c !~ /.+@.+|<.+>/) { >> printf("(body) Ignoring %s from line '%s'\n", >> $what, $_) unless $quiet; >> next; >> } >> This is to reject strings that do not even look like an e-mail >> address, but we called sanitize_address() call on $c way before this >> check. I wonder if we should move this block way up, even before >> the call to santize_address()? >> That was a relatively unrelated tangent. > > Hm, maybe. Should I add this to the patch as well? No, it is an unrelated tangent. We try not to mix unrelated things into a single patch. Perhaps as a separate "clean-up" patch after the dust from the main topic settles, or as a preliminary "clean-up" before the main topic, would be good (but see below). >> It looks to me that there are many other places that we try to be as >> faithful to the original as possible. In the same block as the one >> that handles "Cc:" I quoted above, an address on "From:" is also >> sent intact into @cc and addresses on "To:" are handled the same >> way. >> The patch under discussion singles out the addresses on the trailers >> in the message body and treat them differently from others, which I >> am not sure is what we want to do. > > I think it is a reasonable assumption that the mbox headers will be > conforming, whereas the message body is just freeform text and no such > guarantees exist. But if we want to be paranoid about it, we could try > and sanitize everything. I actually do not think we want to be more paranoid. I'd rather trust what the user gave us, which was where my response came from. Having said that, given that the "Ignoring ..." check we saw earlier in this message triggers only for trailers, it may be a sensible position to take that mail headers are more trustworthy and a random address-looking strings on the trailer lines should be checked more carefully. Because the "Ignoring ..." check is primarily to reject strings that are not addresses (things like bug IDs, or just people's names without e-mail addresses to credit for a new feature request) that we do not ever intend to actually Cc: the message to, another sensible choice coming from that "strings on trailers may not even be addresses" position may be not to add the $c to the Cc: list, if $sc (the sanitized address) and $c (the original address) are different. That is, "the simple regexp check currently used to trigger 'Ignoring ...' message thought that the string looked like an address, but when we ask sanitize_address, it turns out that it was not, really." And if we were to take that route, "Ignoring ..." check, which I called an unrelated tangent, would become very much related to the topic at hand, as it would mean the resulting logic would look more like this: my $sc = sanitize_address($c); if ($c !~ /.+@.+|<.+>/ || $sc ne $c) { printf("(body) Ignoring %s from line '%s'\n", $what, $_) unless $quiet; next; } In any case, if we were to move forward with this topic (whether "send to corrected $sc instead" or "$c is suspicious, do not add it to $cc" is picked as the direction), the motivation behind the design decision to treat the address taken from a trailer line differently needs to be explained better, I think. I had a chance to ask you directly on list, and you gave a good explanation to me in the message I am responding to, but you will not be around when future readers of "git log" want to ask the same question "why are we singling this out while we use the original address in all the other places?". Your proposed commit log message is the place to help them. Oh, before I forget, is this something we can test easily in t9001? We would want to protect a new behaviour from accidental breakage, so adding a test or two would be a good thing. Thanks.