On 2020-08-09 20:49:59-0400, Eric Sunshine wrote: > On Sun, Aug 9, 2020 at 7:06 PM Emma Brooks <me@xxxxxxxxxxx> wrote: > > Add an option to map names and emails to their canonical forms via a > > .mailmap file. This is enabled by default, consistent with the behavior > > of Git itself. > > > > Signed-off-by: Emma Brooks <me@xxxxxxxxxxx> > > --- > > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt > > @@ -751,6 +751,11 @@ default font sizes or lineheights are changed (e.g. via adding extra > > +mailmap:: > > + Use mailmap to find the canonical name/email for > > + committers/authors (see linkgit:git-shortlog[1]). Enabled by > > + default. > > Is this setting global or per-repository? (I ask because documentation > for other options in this section document whether they can be set > per-repository.) Global. I'll add a note that it cannot be set per-project, or I could add support for setting it per-project if that's wanted. > Should there be any sort of support for functionality similar to the > "mailmap.file" and "mailmap.blob" configuration options in Git itself? > (Genuine question, not a demand for you to implement such support.) Yes, that would be useful and should probably be supported. > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > > +# Contents of mailmap stored as a referance to a hash with keys in the format > > s/referance/reference/ OK. > > +# of "name <email>" or "<email>", and values that are hashes containing a > > +# replacement "name" and/or "email". If set (even if empty) the mailmap has > > +# already been read. > > +my $mailmap; > > + > > +sub read_mailmap { > > + my %mailmap = (); > > + open my $fd, '-|', quote_command( > > + git_cmd(), 'cat-file', 'blob', 'HEAD:.mailmap') . ' 2> /dev/null' > > + or die_error(500, 'Failed to read mailmap'); > > Am I reading this correctly that this will die if the project does not > have a .mailmap file? If so, that seems like harsh behavior since > there are many projects in the wild lacking a .mailmap file. No, this error message is misleading. The die_error is called if there is a problem executing git cat-file, but not if cat-file returns an error. I'll revise this message to be more accurate. > > + return \%mailmap if eof $fd; > > + foreach (split '\n', <$fd>) { > > If the .mailmap has no content, then the 'foreach' loop won't be > entered, which means the early 'return' above it is unneeded, correct? > (Not necessarily asking for the early 'return' to be removed, but more > a case of checking that I'm understanding the logic.) The early return is intended to catch when there is no mailmap, so $fd does not get initialized. Without it, you would get an error when you try to split $fd's content: Use of uninitialized value $fd in split at [the foreach] > > + next if (/^#/); > > + if (/(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x || > > + /(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>)/x) { > > + # New Name <new@email> <old@email> > > + # New Name <new@email> Old Name <old@email> > > The first regex is intended to handle a trailing "# comment", whereas > the second regex is for lines lacking a comment, correct? However, > because neither of these expressions are anchored, the second regex > will match both types of lines, thus the first regex is redundant. I'm > guessing, therefore, that your intent was actually to anchor the > expressions, perhaps like this: > > if (/^\s* (.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x || > /^\s* (.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) \s*$/x) { > > Also, if you're matching lines of the form: > > name1 <email1> [optional-name] <email2> > > in which you expect to see "name1", then is the loose "(.*)\s+" > desirable? Shouldn't it be tighter "(.+)\s+"? For instance: > > if (/^\s* (.+)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x || > /^\s* (.+)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) \s*$/x) { Yes and yes. I'll update those. > > + $mailmap{$3} = (); > > I wonder if you should be doing some sort of whitespace normalization > on $3 before using it as a hash key. For instance, if someone has a > .mailmap that looks like this (where I've used "." to represent > space): > > name1.<email1>.name2...<email2> > > then $3 will have three spaces between 'name2' and '<email2>' when > used as a key, and that won't match later when you construct a "name > <email>" key later in map_author() with a single space. Yes, I hadn't considered that case. Thanks.