On Wed, Sep 21, 2016 at 7:09 PM, Jakub Narębski <jnareb@xxxxxxxxx> wrote: > W dniu 21.09.2016 o 13:44, Ævar Arnfjörð Bjarmason napisał: > >> Change the log formatting function to know about "git describe" output >> like v2.8.0-4-g867ad08 in addition to just plain 867ad08. > > All right, that is a good plan. > >> >> This also fixes a micro-regression in my change of the minimum SHA1 >> length from 8 to 7, which is that dated tags like >> hadoop-20160921-113441-20-g094fb7d would start thinking the "20160921" >> part was a commit. > > Actually 20160921 is 8 characters, so assuming that '-' is treated > as word boundary by Perl, it is not a regression; this false positive > was there. The new feature would help, instead of linking false match > it links whole git-describe output. > > So this paragraph needs to be changed wrt. the above. Yeah I just miscounted there, will change that. > Note that there are quite a bit of shortened SHA-1 that are composed > entirely from digits, without a-f characters. *nod* >> >> There are still many valid refnames that we don't link to >> e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to >> v2.8.0-4-g867ad08, but I'm not supporting that with this commit, >> similarly it's trivially possible to create some refnames like >> "æ/var-gf6727b0" or whatever which won't be picked up by this regex. > > Hopefully hierarchical tags are rare. We need to reduce false > positives. > >> >> There's surely room for improvement here, but I just wanted to address >> the very common case of sticking "git describe" output into commit >> messages without trying to link to all possible refnames, that's going >> to be a rather futile exercise given that this is free text, and it >> would be prohibitively expensive to look up whether the references in >> question exist in our repository. > > Note that we do not ask Git at the time of displaying commit message > if the link is valid for performance reasons; we link it, and the link > may be invalid if it was a false positive. > > Note that recommended way to refer to other commit in commit mesages > is (see Documentation/SubmittingPatches): > > If you want to reference a previous commit in the history of a stable > branch, use the format "abbreviated sha1 (subject, date)", > with the subject enclosed in a pair of double-quotes, like this: > > Commit f86a374 ("pack-bitmap.c: fix a memleak", 2015-03-30) > noticed that ... > > Hmmm... this makes previous commit even more important. > >> --- >> gitweb/gitweb.perl | 18 ++++++++++++++++-- >> 1 file changed, 16 insertions(+), 2 deletions(-) >> >> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl >> index 101dbc0..3a52bc7 100755 >> --- a/gitweb/gitweb.perl >> +++ b/gitweb/gitweb.perl >> @@ -2036,10 +2036,24 @@ sub format_log_line_html { >> my $line = shift; >> >> $line = esc_html($line, -nbsp=>1); >> - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{ >> + $line =~ s{ >> + \b >> + ( >> + # The output of "git describe", e.g. v2.10.0-297-gf6727b0 >> + # or hadoop-20160921-113441-20-g094fb7d > > All right, for more complex regular expressions using in-line comments > (extended regexp in Perl) is a good idea. > >> + (?<!-) # see strbuf_check_tag_ref(). Tags can't start with - >> + [A-Za-z0-9.-]+ >> + (?!\.) # refs can't end with ".", see check_refname_format() > > If we can assume that tag name is at least two characters (instead of > at least one character), we could get rid of those extended regexp > lookaround assertions: > > (?<!pattern) - zero-width negative lookbehind assertion > (?!pattern) - zero-width negative lookahead assertion > > That is: > > + [A-Za-z0-9.] # see strbuf_check_tag_ref(). Tags can't start with - > + [A-Za-z0-9.-]* > + [A-Za-z0-9-] # refs can't end with ".", see check_refname_format() Why get rid of them? I'm all for improving the regex, there's bound to be lots of bugs in it, but since it's perl we can freely use its extended features. > Also, the canonical documentation for what is allowed in refnames > is git-check-ref-format(1)... though it does not look like it includes > "tags cannot start with '-'". Yeah, looks like that manpage needs to be patched. > Anyway, perhaps 'is it valid refname' could be passed to a subroutine, > or a named regexp (which might be more involved, like disallowing two > consecutive dots, e.g. "(?!.*\.{2})" at beginning). > >> + -g[0-9a-fA-F]{7,40} > > If we are limiting to git-describe output, we can get rid of A-F here. Indeed. >> + | >> + # Just a normal looking Git SHA1 >> + [0-9a-fA-F]{7,40} >> + ) >> + \b >> + }{ >> $cgi->a({-href => href(action=>"object", hash=>$1), >> -class => "text"}, $1); >> - }eg; >> + }egx; >> >> return $line; >> } >> > > Good work. > > I assume that you are using git-describe output in commit messages > a lot, isn't it? Yeah, and I got tired of gitweb not linking to any of the commits I was referencing.