I'm interested in cross-linking bug references in commit messages to a bug tracking system. I started tinkering a couple weeks ago and am finally understanding that committags encompass this functionality. (From the subject line I first understood "tags" to mean git tags rather than commit message munging.) Is the committags idea still under active development? I'd be happy to share what I have, which is for bug linking only, but very little code would apply to the more general concept of committags. Here are some ideas that might apply... Two regexes would make it easier to configure a driver without needing look-ahead and look-behind assertions. For example, if you want to match non-negative integers but only in the context of a Resolves-bug header: Resolves-bug: 1234, 1235 With two regexes you can write: /^Resolves-bug: \d+(, \d+)*/ /\d+/ But with only one, you'd have to write: /(?<=^Resolves-bug: (\d+, )*)(\d+)/ The need for a lookbehind assertion means I need to stop at perlreref to lookup syntax. Hrm... and with testing I see that it's worse than that: $ perl -wpe 's/(?<=^Resolves-bug: (\d+, )*)(\d+)/[$2]/g' Variable length lookbehind not implemented in regex; marked by <-- HERE in m/(?<=^Resolves-bug: (\d+, )*)(\d+) <-- HERE / at -e line 1. I guess it can't be done even with the look-behind assertion. I got the two-regex idea from a spec I ran across while evaluating Subversion: http://guest:@tortoisesvn.tigris.org/svn/tortoisesvn/trunk/doc/issuetrackers.txt If there is interest in BTS integration beyond gitweb, for example in git-gui, gitk, or the Windows UIs, then perhaps this spec would be worth considering. It covers more than just hyperlinking. It also considers issues like how to draw the form field for a bug ID as part of a commit message form, how to validate that form field, and then how to munge the log message to include the entered ID. Some details, like using a newline to separate the two regexes, might be more awkward for Git than Subversion. I like the idea of allowing a regex writer -- a gitweb admin or a repository owner -- to ignore issues regarding HTML escaping. For example, I'd rather not have in the regex. And I don't want the replacement to have to escape "&" in a query string. That's a strength of not having to write the whole link replacement rule. And I think hyperlinking will be one of the most common uses of this committag feature, so it's worth special support. In the case of false positives, it might also be helpful to have a title attribute that explains the committag's interpretation of the text. I also like the idea of giving the admin full control to specify a Perl function of some sort, which might go as far as looking up bug summaries for the "title" attribute or adding JS to fetch it via AJAX on mouse-over. But I doubt I would bother with that myself. Appealing as it is, the use of '$1' in my replacements didn't work for me: $ perl -wpe '$reg = "(\\d+)"; $rep = ".\$1."; s/$reg/$rep/g' 123 .$1. I think usage of capturing parenthesis is important, even with two regexes, because it makes it easier to specify link text that's broader than the data that goes in the URL. Specifically, I wanted to be able to produce HTML like this, with the hash mark hyperlinked but not used in the URL: <a href="...bug=123">#123</a> I guess that's just my aesthetic. To support that, my code calls sprintf with $&, $1, $2, ... $9, and that particualr replacement URL uses %2$s. I'm concerned about the composition of these committag drivers. In other words, will it be hard for the configurer to manage interactions between committag drivers? To choose a sane order, will I have to understand the implementation details of each committag driver? Perhaps a simpler alternative would be to let at most one driver process a given snippet of text, forbidding nesting of replacements. (If I understand Junio's suggestion to use a list of strings and refs, non-nesting overlaps are already not supported.) If all replacements were hyperlinks -- and I expect that to be the common case -- they wouldn't be nestable anyway. I wouldn't see it as a huge loss for the nesting examples I can think of: Separate rules for span around S-o-b and linking or obfuscation of email could be combined into one... A rule to shade text quoted email-style with leading angle brackets could just clobber any further processing of that text. And it might simplify the code and testing of it quite a bit. If committags do turn out to support nesting, perhaps it would make sense to stratify the ordering so that it's clear whether a particular driver takes as input HTML vs. text and outputs HTML vs. text. (For example, weak email obfuscation might be text -> text.) I guess to strictly honor the input and output types of a driver, the text -> html drivers still have to be expanded in a single pass. A few ideas for drivers that I don't think have been mentioned yet: * Wiki page names, like to [[Feature Documentation]]. These are notable because they tend to contain punctuation that get HTML-escaped, like quotes and ampersands. * Links to gitweb itself, such as 123abc:file.txt and HEAD:file.txt. I guess the current hash linking sort of does the first case except that you have to get the hash of the blob instead of using the commit hash, and the current hash linking wouldn't reveal the filename until after you click, nor when viewing textual log messages. I'm not sure whether special support for linking to multi-commit diffs or other object types would be as helpful. Marcel Jakub Narebski wrote: > Dnia sobota 8. listopada 2008 21:02, Francis Galiegue napisał: >> Le Saturday 08 November 2008 20:07:53 Jakub Narebski, vous avez écrit : >>> Francis Galiegue <fg@xxxxxxxxxxxx> writes >>> in "Need help for migration from CVS to git in one go..." >>> >>>> * third: also Bonsai-related; Bonsai can link to Bugzilla by >>>> matching (wild guess) /\b(?:#?)(\d+)\b/ and transforming this into >>>> http://your.bugzilla.fqdn.here/show_bug.cgi?id=$1. Does gitweb have >>>> this built-in? (haven't looked yet) Is this planned, or has it been >>>> discussed and been considered not worth the hassle? > [...] > >>> Committags are "tags" in commit messages, expanded when rendering commit >>> message, like gitweb now does for (shortened) SHA-1, converting them to >>> 'object' view link. It should be done in a way to make it easy >>> configurable, preferably having to configure only variable part, and not >>> having to write whole replacement rule. >>> >>> Possible committags include: _BUG(n)_, bug _#n_, _FEATURE(n), >>> Message-Id, plain text URL e.g. _http://repo.or.cz_, spam protecting >>> of email addresses, "rich text formatting" like *bold* and _underline_, >>> syntax highlighting of signoff lines. >>> >> What do you mean with "not having to write whole replacement rule"? > > Like in example with 'link' rule, not having to write whole > <a href="http://example.com/bugzilla.php?id=$1">$&</a> > (or something like that). > >>> I think it would be good idea to use repository config file for >>> setting-up repository-specific committags, and use whatever Perl >>> structure for global configuration. The config language can be >>> borrowed from "drivers" in gitattributes (`diff' and `merge' drivers). >>> >>> So the example configuration could look like this: >>> >>> [gitweb] >>> committags = sha1 signoff bugzilla >>> >>> [committag "bugzilla"] >>> match = "\\b(?:#?)(\\d+)\\b" >>> link = "http://your.bugzilla.fqdn.here/show_bug.cgi?id=$1" >>> >>> where 'sha1' and 'signoff' are built-in committags, committags are >>> applied in the order they are put in gitweb.committags; >> I don't understand what the "signoff" builtin is : is that a link to see only >> commits "Signed-off-by:" a particular person? > > Committags doesn't need to be replaced by links. In this case I meant > here using 'signoff' class for Signed-off-by: (and the like) lines, by > wrapping it in '<span class="signoff">' ... '</a>'. > >> And also, what about the sha1 builtin? AFAIK, a SHA1 can point to a commit, a >> tree, and others... In fact, it points to any of these right now, but how >> would you tell apart these different SHA1s in a commit message? The only >> obvious use I see for it is the builtin "Revert ..." commit message, that the >> commiter _can_ override... > > SHA1 (or shortened SHA1 from 8 charasters to 40 characters, or to be > even more exact something that looks like SHA1) is replaced by link > to 'object' view, which in turn finds type of object and _redirect_ > to proper view, be it 'commit' (most frequent), 'tag', 'blob' or 'tree'. > > We could have used instead gitweb link with 'h' (hash) parameter, but > without 'a' (action) parameter, which currently finds type of object > and _uses_ correct view... > >> Finally, is there any reason to think that a sha1 or signoff committag will >> ever need to be overriden in some way? > > One might not want to link SHA1, for example if there are lots of false > positives because of commit message conventions or something, or refine > 'signoff' committag to use different styles for different types of > signoff: Signed-off-by, Acked-by, Tested-by, other. Having explicit > 'signoff' committag allows us also to put some committags _after_ it, > for example SPAM-protection of emails, or add some committag before > 'sha1' to filter out some SHA1 match false positives. > >>> possible actions >>> for committag driver include: >>> * link: replace $match by '_<a href="$link">_$match_</a>_' >>> * html: replace $match by '_$html_' >>> * text: replace $match by '$text' >>> where '_a_' means that 'a' is treated as HTML, and is not expanded >>> further, and 'b' means that it can be further expanded by later >>> committags, and finally is HTML-escaped (esc_html). >>> >> What use do you see for the html match? Just asking... > > For example 'signoff' committag... well, it is not exactly pure "html" > but rather something like template. > > [committag "signoff"] > match = "(?i)^ *(signed[ \\-]off[ \\-]by[ :]|acked[ \\-]by[ :]|cc[ :])" > templ = "{<span class=\"signoff\">}$1{</span>}" > > Or simpler > > [committag "signoff"] > match = "(?i)^ *(signed[ \\-]off[ \\-]by[ :]|acked[ \\-]by[ :]|cc[ :])" > class = signoff > >> And I don't see what you '_a_' and '_b_' are about... > > For example in link match, the text of the link can be further refined > by committags later in sequence. > > -- > Jakub Narebski > Poland > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html