Re: [PATCH] git-send-email: fix get_maintainer.pl regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Eric Sunshine <sunshine@xxxxxxxxxxxxxx> writes:

> A few more comments/observations...
>
> On Thu, Nov 16, 2017 at 10:48 AM, Alex Bennée <alex.bennee@xxxxxxxxxx> wrote:
>> diff --git a/perl/Git.pm b/perl/Git.pm
>> @@ -936,6 +936,9 @@ sub parse_mailboxes {
>>                         $end_of_addr_seen = 0;
>>                 } elsif ($token =~ /^\(/) {
>>                         push @comment, $token;
>> +               } elsif ($token =~ /^\)/) {
>> +                       my $nested_comment = pop @comment;
>> +                       push @comment, "$nested_comment$token";
>
> Due to the way tokenization works, it looks like you will only ever
> see a ")" as a single character. That suggests that you should be
> using ($token eq ")"), as is done for "<" and ">", rather than ($token
> =~ /^\)/).
>
> What happens if there is text before the final closing ')'? For
> instance, "foo@bar (bibble (bobble) smoo)" or "...)smoo)". The result
> is that "smoo" ends up tacked onto the end of the email address
> ("foo@barsmoo") rather than incorporated into the comment, as
> intended.
>
> What happens if you encounter a ")" but haven't yet encountered an
> opening "(" (that is, @comment is empty)? For example, "foo@bar )". In
> that case, it unconditionally pops from the empty array, which seems
> iffy at best. It might be nice to see this case taken into
> consideration explicitly.

Yeah I was only really aiming for the current regression but I'm sure it
could be more solid. I do note that my @known_failure_list in test.pl
has a bunch of other cases that need fixing up.

> I also was wondering if it would make more sense to take advantage of
> Perl's ability to match nested expressions (??{$nested}), however,
> that feature apparently was added in 5.10, and Git.pm only requires
> 5.8, so perhaps not (unless we want to bump the requirement higher).

Hmm that might be a case of abusing regex to do something better suited
to a proper tokenizer.

>
> Aside from those observations, it looks like the tokenizer in this
> function is broken. For any input with the address enclosed in "<" and
> ">", the comment is lost entirely; it doesn't even end up in the
> @tokens array. Since you're already fixing bugs/regressions in this
> code, perhaps that's something you'd like to tackle as well in a
> separate patch? ("No" is an acceptable answer, of course.)
>
>>                 } elsif ($token eq "<") {
>>                         push @phrase, (splice @address), (splice @buffer);
>>                 } elsif ($token eq ">") {

I can have a go but my perl-fu has weakened somewhat since I stopped
having to maintain perl code for a living. It's almost as though my
brain was glad to dump the knowledge ;-)

I guess we could maintain a nesting count and a current token type and
use that to more intelligently direct the nested portions to the
appropriate bits. Maybe Matthieu or Remi (CC'ed) might want to chime in
on other options?

--
Alex Bennée




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux