[RFC PATCH] gitweb: improve title shortening heuristics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

First of all, this is my first time on this ML so apologies in advance if I
missed anything in the patch submission guidelines.

We got some report recently that the commit short title on the postgres gitweb
instance was sometimes being mangled (1).  After a bit of digging, it appears
to be due to some long time heuristics to remove some uninteresting parts of a
commit message (see 198066916a8 from August 2005).  In our case, it removed any
occurrence of "master." in the commit message even if the message contains
"postmaster.c" rather than a cname (or something that looks like it), leading
to the commit message:

Remove postmaster.c's reset_shared() wrapper function.

being displayed as:

Remove postc's reset_shared() wrapper function.

It's probably some corner case for which there's barely any complaint, so it
doesn't look worthwhile to spend too much effort on it.  It also seems
impossible to make the current approach entirely bullet proof, but if we simply
make sure that the prefix is preceded by at least one whitespace and isn't
followed by another one we could avoid almost all of the incorrect matches (and
all of them as far as postgres is concerned).  Would that be an acceptable
compromise?  If yes, I'm attaching a patch that does that (and also adds git://
and https:// to the list of trimmed protocols while at it).

Otherwise, would it be acceptable to disable the whole block (the "remove
leading stuff of merges to make the interesting part visible") with some new
configuration option?

Cheers,
Julien.

[1] https://www.postgresql.org/message-id/flat/4025723.1658013974%40sss.pgh.pa.us
>From ed46dcd2796b9af6ba3f73d46a3141a88964ed11 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@xxxxxxx>
Date: Sun, 24 Jul 2022 13:17:19 +0800
Subject: [PATCH v1] gitweb: improve title_short shortening heuristics

In order to shorten the title, some common domain prefixes can be detected and
removed.  However, the current regex matches those prefix anywhere in the
title which makes it likely to remove it where it's not intended.

To make that case less likely, make sure that the prefix is preceded by at
least one whitespace and isn't followed by another whitespace.

While at it, also add  git:// and https:// to the list of detected and trimmed
protocols.

Signed-off-by: Julien Rouhaud <julien.rouhaud@xxxxxxx>
---
 gitweb/gitweb.perl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 1835487ab2..18dd0b93fb 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3565,10 +3565,10 @@ sub parse_commit_text {
 				$title =~ s/^Automatic //;
 				$title =~ s/^merge (of|with) /Merge ... /i;
 				if (length($title) > 50) {
-					$title =~ s/(http|rsync):\/\///;
+					$title =~ s/(git|http|https|rsync):\/\///;
 				}
 				if (length($title) > 50) {
-					$title =~ s/(master|www|rsync)\.//;
+					$title =~ s/\s+(master|www|rsync)\.([^\s])/ \2/;
 				}
 				if (length($title) > 50) {
 					$title =~ s/kernel.org:?//;
-- 
2.37.0


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux