Hi, First of all, this is my first time on this ML so apologies in advance if I missed anything in the patch submission guidelines. We got some report recently that the commit short title on the postgres gitweb instance was sometimes being mangled (1). After a bit of digging, it appears to be due to some long time heuristics to remove some uninteresting parts of a commit message (see 198066916a8 from August 2005). In our case, it removed any occurrence of "master." in the commit message even if the message contains "postmaster.c" rather than a cname (or something that looks like it), leading to the commit message: Remove postmaster.c's reset_shared() wrapper function. being displayed as: Remove postc's reset_shared() wrapper function. It's probably some corner case for which there's barely any complaint, so it doesn't look worthwhile to spend too much effort on it. It also seems impossible to make the current approach entirely bullet proof, but if we simply make sure that the prefix is preceded by at least one whitespace and isn't followed by another one we could avoid almost all of the incorrect matches (and all of them as far as postgres is concerned). Would that be an acceptable compromise? If yes, I'm attaching a patch that does that (and also adds git:// and https:// to the list of trimmed protocols while at it). Otherwise, would it be acceptable to disable the whole block (the "remove leading stuff of merges to make the interesting part visible") with some new configuration option? Cheers, Julien. [1] https://www.postgresql.org/message-id/flat/4025723.1658013974%40sss.pgh.pa.us
>From ed46dcd2796b9af6ba3f73d46a3141a88964ed11 Mon Sep 17 00:00:00 2001 From: Julien Rouhaud <julien.rouhaud@xxxxxxx> Date: Sun, 24 Jul 2022 13:17:19 +0800 Subject: [PATCH v1] gitweb: improve title_short shortening heuristics In order to shorten the title, some common domain prefixes can be detected and removed. However, the current regex matches those prefix anywhere in the title which makes it likely to remove it where it's not intended. To make that case less likely, make sure that the prefix is preceded by at least one whitespace and isn't followed by another whitespace. While at it, also add git:// and https:// to the list of detected and trimmed protocols. Signed-off-by: Julien Rouhaud <julien.rouhaud@xxxxxxx> --- gitweb/gitweb.perl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 1835487ab2..18dd0b93fb 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -3565,10 +3565,10 @@ sub parse_commit_text { $title =~ s/^Automatic //; $title =~ s/^merge (of|with) /Merge ... /i; if (length($title) > 50) { - $title =~ s/(http|rsync):\/\///; + $title =~ s/(git|http|https|rsync):\/\///; } if (length($title) > 50) { - $title =~ s/(master|www|rsync)\.//; + $title =~ s/\s+(master|www|rsync)\.([^\s])/ \2/; } if (length($title) > 50) { $title =~ s/kernel.org:?//; -- 2.37.0