Re: gigantic commit messages, was Re: Git Bug Report: out of memory using git tag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 02 2022, Jeff King wrote:

> On Wed, Nov 02, 2022 at 01:14:59AM -0700, Elijah Newren wrote:
>
>> On Wed, Nov 2, 2022 at 12:51 AM Jeff King <peff@xxxxxxxx> wrote:
>> >
>> > Here are patches which fix them both. I may be setting a new record for
>> > the ratio of commit message lines to changed code
>> 
>> It looks like the first patch is 72 lines of commit message for a
>> one-line fix, and the second patch is 61 lines of commit message for a
>> two line fix.
>> 
>> I don't know what the record ratio is, but it's at least 96[1], so
>> clearly you'll need to figure out how to pad your first commit message
>> with at least another 25 lines before this series can be accepted.
>> ;-)
>
> Well, if we want to start digging things up... ;)
>
> Try this:
>
>   git log --no-merges --no-renames --format='%H %B' -z --numstat '*.c' |
>   perl -0ne '
>     chomp;
>     if (s/^([0-9a-f]{40}) //) {
>       if (defined $commit && $diff) {
>         my $ratio = $body / $diff;
>         print "$ratio $body $diff $commit\n";
>       }
>       $commit = $1;
>       $body = () = /\n/g;
>       $diff = 0;
>     } elsif (/^\s*(\d+)\t/) {
>       # this counts only added lines, under the assumption that
>       # small commits generally remove/add in proportion. Of course
>       # ones that _only_ remove lines have infinite ratios.
>       $diff += $1;
>     } else {
>       die "confusing record: $_\n";
>     }
>   ' |
>   sort -rn |
>   head
>
> which shows there are a few in the 100's. Pipe through:
>
>   awk '{print $4}' |
>   git log --stdin --no-walk=unsorted --stat
>
> for a nicer view. I'm rejecting the top one on the grounds that it's
> mostly cut-and-paste output, and also that #2 is mine. ;)

I think that '*.c' is cheating, if anything I should be getting more
points when you remove that, as I've been over explaining
adding/removing a compiler flag or something. At least your #2 is tricky
C code :)

I haven't bothered to do this, but I think if you --word-diff
--word-diff-regex=. and parse the resulting diff you'd get "better"
results.

Or, for better & similar (but not the same): compute the levenshtein
distance of the pre- and post-image, and compute edit distance to commit
message length.

I haven't done that, but just from eyeballing it I think [1] beats your
[2] by that criteria. Per:
	
	$ perl -MText::Levenshtein=distance -wE 'say distance @ARGV' int unsigned
	6
	$ perl -MText::Levenshtein=distance -wE 'say distance @ARGV' "" _lf
	3

It should get 2x the score v.s. yours, but yours is <2x the
words/characters.

(Edit: But see [4] below)

There's also e.g. my [3] that's fairly high in the running per your
"only added lines". But I think it shows the perils of doing that,
i.e. in general I don't see why you'd omit deletions, that commit
message is certainly spending most of its time talking about why the
deletion of the code at hand is OK.

Once you count deletions it'll get *way* down the list, as it's 11
deleted lines, 1 added.

Hrm, I take some of the above back, I think [4] might be the winner.
That's just an edit distance of 1, so it's around 2x the commit message
length of yours if we adjust for your score of 6. (~2.5 by
characters)[5].

1. 356c4732950 (credential: treat CR/LF as line endings in the
   credential protocol, 2020-10-03)
2. aec0bba106d (config: work around gcc-10 -Wstringop-overflow warning,
   2020-08-04)
3. f97fe358576 (pickaxe -G: don't special-case create/delete,
   2021-04-12)
4. c58bebd4c67 (ci: update Cirrus-CI image to FreeBSD 12.3, 2022-05-25)
5. All measured with "git show --no-notes --no-patch <commit> | wc",
   because I was lazy.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux