Re: [RFC] Configuring (future) committags support in gitweb, especially bug linking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm interested in cross-linking bug references in commit messages to a
bug tracking system.  I started tinkering a couple weeks ago and am
finally understanding that committags encompass this functionality.
(From the subject line I first understood "tags" to mean git tags rather
than commit message munging.)

Is the committags idea still under active development?

I'd be happy to share what I have, which is for bug linking only, but
very little code would apply to the more general concept of committags.
 Here are some ideas that might apply...


Two regexes would make it easier to configure a driver without needing
look-ahead and look-behind assertions.  For example, if you want to
match non-negative integers but only in the context of a Resolves-bug
header:

    Resolves-bug: 1234, 1235

With two regexes you can write:

    /^Resolves-bug: \d+(, \d+)*/
    /\d+/

But with only one, you'd have to write:

    /(?<=^Resolves-bug: (\d+, )*)(\d+)/

The need for a lookbehind assertion means I need to stop at perlreref to
lookup syntax.  Hrm... and with testing I see that it's worse than that:

    $ perl -wpe 's/(?<=^Resolves-bug: (\d+, )*)(\d+)/[$2]/g'
    Variable length lookbehind not implemented in regex; marked by <--
    HERE in m/(?<=^Resolves-bug: (\d+, )*)(\d+) <-- HERE / at -e line 1.

I guess it can't be done even with the look-behind assertion.


I got the two-regex idea from a spec I ran across while evaluating
Subversion:

http://guest:@tortoisesvn.tigris.org/svn/tortoisesvn/trunk/doc/issuetrackers.txt

If there is interest in BTS integration beyond gitweb, for example in
git-gui, gitk, or the Windows UIs, then perhaps this spec would be worth
considering.  It covers more than just hyperlinking.  It also considers
issues like how to draw the form field for a bug ID as part of a commit
message form, how to validate that form field, and then how to munge the
log message to include the entered ID.  Some details, like using a
newline to separate the two regexes, might be more awkward for Git than
Subversion.


I like the idea of allowing a regex writer -- a gitweb admin or a
repository owner -- to ignore issues regarding HTML escaping.  For
example, I'd rather not have &nbsp; in the regex.  And I don't want the
replacement to have to escape "&" in a query string.  That's a strength
of not having to write the whole link replacement rule.  And I think
hyperlinking will be one of the most common uses of this committag
feature, so it's worth special support.

In the case of false positives, it might also be helpful to have a title
attribute that explains the committag's interpretation of the text.

I also like the idea of giving the admin full control to specify a Perl
function of some sort, which might go as far as looking up bug summaries
for the "title" attribute or adding JS to fetch it via AJAX on
mouse-over.  But I doubt I would bother with that myself.


Appealing as it is, the use of '$1' in my replacements didn't work for me:

    $ perl -wpe '$reg = "(\\d+)"; $rep = ".\$1."; s/$reg/$rep/g'
    123
    .$1.

I think usage of capturing parenthesis is important, even with two
regexes, because it makes it easier to specify link text that's broader
than the data that goes in the URL.  Specifically, I wanted to be able
to produce HTML like this, with the hash mark hyperlinked but not used
in the URL:

    <a href="...bug=123">#123</a>

I guess that's just my aesthetic.  To support that, my code calls
sprintf with $&, $1, $2, ... $9, and that particualr replacement URL
uses %2$s.


I'm concerned about the composition of these committag drivers.  In
other words, will it be hard for the configurer to manage interactions
between committag drivers?  To choose a sane order, will I have to
understand the implementation details of each committag driver?

Perhaps a simpler alternative would be to let at most one driver process
a given snippet of text, forbidding nesting of replacements.  (If I
understand Junio's suggestion to use a list of strings and refs,
non-nesting overlaps are already not supported.)  If all replacements
were hyperlinks -- and I expect that to be the common case -- they
wouldn't be nestable anyway.  I wouldn't see it as a huge loss for the
nesting examples I can think of:  Separate rules for span around S-o-b
and linking or obfuscation of email could be combined into one...  A
rule to shade text quoted email-style with leading angle brackets could
just clobber any further processing of that text.  And it might simplify
the code and testing of it quite a bit.

If committags do turn out to support nesting, perhaps it would make
sense to stratify the ordering so that it's clear whether a particular
driver takes as input HTML vs. text and outputs HTML vs. text.  (For
example, weak email obfuscation might be text -> text.)  I guess to
strictly honor the input and output types of a driver, the text -> html
drivers still have to be expanded in a single pass.


A few ideas for drivers that I don't think have been mentioned yet:

* Wiki page names, like to [[Feature Documentation]].  These are notable
because they tend to contain punctuation that get HTML-escaped, like
quotes and ampersands.

* Links to gitweb itself, such as 123abc:file.txt and HEAD:file.txt.  I
guess the current hash linking sort of does the first case except that
you have to get the hash of the blob instead of using the commit hash,
and the current hash linking wouldn't reveal the filename until after
you click, nor when viewing textual log messages.  I'm not sure whether
special support for linking to multi-commit diffs or other object types
would be as helpful.

Marcel


Jakub Narebski wrote:
> Dnia sobota 8. listopada 2008 21:02, Francis Galiegue napisał:
>> Le Saturday 08 November 2008 20:07:53 Jakub Narebski, vous avez écrit :
>>> Francis Galiegue <fg@xxxxxxxxxxxx> writes
>>> in "Need help for migration from CVS to git in one go..."
>>>
>>>> * third: also Bonsai-related; Bonsai can link to Bugzilla by
>>>> matching (wild guess) /\b(?:#?)(\d+)\b/ and transforming this into
>>>> http://your.bugzilla.fqdn.here/show_bug.cgi?id=$1. Does gitweb have
>>>> this built-in? (haven't looked yet) Is this planned, or has it been
>>>> discussed and been considered not worth the hassle?
> [...]
>
>>> Committags are "tags" in commit messages, expanded when rendering commit
>>> message, like gitweb now does for (shortened) SHA-1, converting them to
>>> 'object' view link.  It should be done in a way to make it easy
>>> configurable, preferably having to configure only variable part, and not
>>> having to write whole replacement rule.
>>>
>>> Possible committags include: _BUG(n)_, bug _#n_, _FEATURE(n),
>>> Message-Id, plain text URL e.g. _http://repo.or.cz_, spam protecting
>>> of email addresses, "rich text formatting" like *bold* and _underline_,
>>> syntax highlighting of signoff lines.
>>>
>> What do you mean with "not having to write whole replacement rule"?
>
> Like in example with 'link' rule, not having to write whole
> <a href="http://example.com/bugzilla.php?id=$1";>$&</a>
> (or something like that).
>
>>> I think it would be good idea to use repository config file for
>>> setting-up repository-specific committags, and use whatever Perl
>>> structure for global configuration. The config language can be
>>> borrowed from "drivers" in gitattributes (`diff' and `merge' drivers).
>>>
>>> So the example configuration could look like this:
>>>
>>>   [gitweb]
>>>      committags = sha1 signoff bugzilla
>>>
>>>   [committag "bugzilla"]
>>>      match = "\\b(?:#?)(\\d+)\\b"
>>>      link  = "http://your.bugzilla.fqdn.here/show_bug.cgi?id=$1";
>>>
>>> where 'sha1' and 'signoff' are built-in committags, committags are
>>> applied in the order they are put in gitweb.committags;
>> I don't understand what the "signoff" builtin is : is that a link to see only
>> commits "Signed-off-by:" a particular person?
>
> Committags doesn't need to be replaced by links. In this case I meant
> here using 'signoff' class for Signed-off-by: (and the like) lines, by
> wrapping it in '<span class="signoff">' ... '</a>'.
>
>> And also, what about the sha1 builtin? AFAIK, a SHA1 can point to a commit, a
>> tree, and others... In fact, it points to any of these right now, but how
>> would you tell apart these different SHA1s in a commit message? The only
>> obvious use I see for it is the builtin "Revert ..." commit message, that the
>> commiter _can_ override...
>
> SHA1 (or shortened SHA1 from 8 charasters to 40 characters, or to be
> even more exact something that looks like SHA1) is replaced by link
> to 'object' view, which in turn finds type of object and _redirect_
> to proper view, be it 'commit' (most frequent), 'tag', 'blob' or 'tree'.
>
> We could have used instead gitweb link with 'h' (hash) parameter, but
> without 'a' (action) parameter, which currently finds type of object
> and _uses_ correct view...
>
>> Finally, is there any reason to think that a sha1 or signoff committag will
>> ever need to be overriden in some way?
>
> One might not want to link SHA1, for example if there are lots of false
> positives because of commit message conventions or something, or refine
> 'signoff' committag to use different styles for different types of
> signoff: Signed-off-by, Acked-by, Tested-by, other.  Having explicit
> 'signoff' committag allows us also to put some committags _after_ it,
> for example SPAM-protection of emails, or add some committag before
> 'sha1' to filter out some SHA1 match false positives.
>
>>> possible actions
>>> for committag driver include:
>>>  * link: replace $match by '_<a href="$link">_$match_</a>_'
>>>  * html: replace $match by '_$html_'
>>>  * text: replace $match by '$text'
>>> where '_a_' means that 'a' is treated as HTML, and is not expanded
>>> further, and 'b' means that it can be further expanded by later
>>> committags, and finally is HTML-escaped (esc_html).
>>>
>> What use do you see for the html match? Just asking...
>
> For example 'signoff' committag... well, it is not exactly pure "html"
> but rather something like template.
>
>   [committag "signoff"]
>         match = "(?i)^ *(signed[ \\-]off[ \\-]by[ :]|acked[ \\-]by[ :]|cc[ :])"
>         templ = "{<span class=\"signoff\">}$1{</span>}"
>
> Or simpler
>
>   [committag "signoff"]
>         match = "(?i)^ *(signed[ \\-]off[ \\-]by[ :]|acked[ \\-]by[ :]|cc[ :])"
>         class = signoff
>
>> And I don't see what you '_a_' and '_b_' are about...
>
> For example in link match, the text of the link can be further refined
> by committags later in sequence.
>
> --
> Jakub Narebski
> Poland
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux