Eric Sunshine <sunshine@xxxxxxxxxxxxxx> writes: > A few more comments/observations... > > On Thu, Nov 16, 2017 at 10:48 AM, Alex Bennée <alex.bennee@xxxxxxxxxx> wrote: >> diff --git a/perl/Git.pm b/perl/Git.pm >> @@ -936,6 +936,9 @@ sub parse_mailboxes { >> $end_of_addr_seen = 0; >> } elsif ($token =~ /^\(/) { >> push @comment, $token; >> + } elsif ($token =~ /^\)/) { >> + my $nested_comment = pop @comment; >> + push @comment, "$nested_comment$token"; > > Due to the way tokenization works, it looks like you will only ever > see a ")" as a single character. That suggests that you should be > using ($token eq ")"), as is done for "<" and ">", rather than ($token > =~ /^\)/). > > What happens if there is text before the final closing ')'? For > instance, "foo@bar (bibble (bobble) smoo)" or "...)smoo)". The result > is that "smoo" ends up tacked onto the end of the email address > ("foo@barsmoo") rather than incorporated into the comment, as > intended. > > What happens if you encounter a ")" but haven't yet encountered an > opening "(" (that is, @comment is empty)? For example, "foo@bar )". In > that case, it unconditionally pops from the empty array, which seems > iffy at best. It might be nice to see this case taken into > consideration explicitly. Yeah I was only really aiming for the current regression but I'm sure it could be more solid. I do note that my @known_failure_list in test.pl has a bunch of other cases that need fixing up. > I also was wondering if it would make more sense to take advantage of > Perl's ability to match nested expressions (??{$nested}), however, > that feature apparently was added in 5.10, and Git.pm only requires > 5.8, so perhaps not (unless we want to bump the requirement higher). Hmm that might be a case of abusing regex to do something better suited to a proper tokenizer. > > Aside from those observations, it looks like the tokenizer in this > function is broken. For any input with the address enclosed in "<" and > ">", the comment is lost entirely; it doesn't even end up in the > @tokens array. Since you're already fixing bugs/regressions in this > code, perhaps that's something you'd like to tackle as well in a > separate patch? ("No" is an acceptable answer, of course.) > >> } elsif ($token eq "<") { >> push @phrase, (splice @address), (splice @buffer); >> } elsif ($token eq ">") { I can have a go but my perl-fu has weakened somewhat since I stopped having to maintain perl code for a living. It's almost as though my brain was glad to dump the knowledge ;-) I guess we could maintain a nesting count and a current token type and use that to more intelligently direct the nested portions to the appropriate bits. Maybe Matthieu or Remi (CC'ed) might want to chime in on other options? -- Alex Bennée