On Wed, Nov 02 2022, Jeff King wrote: > On Wed, Nov 02, 2022 at 01:14:59AM -0700, Elijah Newren wrote: > >> On Wed, Nov 2, 2022 at 12:51 AM Jeff King <peff@xxxxxxxx> wrote: >> > >> > Here are patches which fix them both. I may be setting a new record for >> > the ratio of commit message lines to changed code >> >> It looks like the first patch is 72 lines of commit message for a >> one-line fix, and the second patch is 61 lines of commit message for a >> two line fix. >> >> I don't know what the record ratio is, but it's at least 96[1], so >> clearly you'll need to figure out how to pad your first commit message >> with at least another 25 lines before this series can be accepted. >> ;-) > > Well, if we want to start digging things up... ;) > > Try this: > > git log --no-merges --no-renames --format='%H %B' -z --numstat '*.c' | > perl -0ne ' > chomp; > if (s/^([0-9a-f]{40}) //) { > if (defined $commit && $diff) { > my $ratio = $body / $diff; > print "$ratio $body $diff $commit\n"; > } > $commit = $1; > $body = () = /\n/g; > $diff = 0; > } elsif (/^\s*(\d+)\t/) { > # this counts only added lines, under the assumption that > # small commits generally remove/add in proportion. Of course > # ones that _only_ remove lines have infinite ratios. > $diff += $1; > } else { > die "confusing record: $_\n"; > } > ' | > sort -rn | > head > > which shows there are a few in the 100's. Pipe through: > > awk '{print $4}' | > git log --stdin --no-walk=unsorted --stat > > for a nicer view. I'm rejecting the top one on the grounds that it's > mostly cut-and-paste output, and also that #2 is mine. ;) I think that '*.c' is cheating, if anything I should be getting more points when you remove that, as I've been over explaining adding/removing a compiler flag or something. At least your #2 is tricky C code :) I haven't bothered to do this, but I think if you --word-diff --word-diff-regex=. and parse the resulting diff you'd get "better" results. Or, for better & similar (but not the same): compute the levenshtein distance of the pre- and post-image, and compute edit distance to commit message length. I haven't done that, but just from eyeballing it I think [1] beats your [2] by that criteria. Per: $ perl -MText::Levenshtein=distance -wE 'say distance @ARGV' int unsigned 6 $ perl -MText::Levenshtein=distance -wE 'say distance @ARGV' "" _lf 3 It should get 2x the score v.s. yours, but yours is <2x the words/characters. (Edit: But see [4] below) There's also e.g. my [3] that's fairly high in the running per your "only added lines". But I think it shows the perils of doing that, i.e. in general I don't see why you'd omit deletions, that commit message is certainly spending most of its time talking about why the deletion of the code at hand is OK. Once you count deletions it'll get *way* down the list, as it's 11 deleted lines, 1 added. Hrm, I take some of the above back, I think [4] might be the winner. That's just an edit distance of 1, so it's around 2x the commit message length of yours if we adjust for your score of 6. (~2.5 by characters)[5]. 1. 356c4732950 (credential: treat CR/LF as line endings in the credential protocol, 2020-10-03) 2. aec0bba106d (config: work around gcc-10 -Wstringop-overflow warning, 2020-08-04) 3. f97fe358576 (pickaxe -G: don't special-case create/delete, 2021-04-12) 4. c58bebd4c67 (ci: update Cirrus-CI image to FreeBSD 12.3, 2022-05-25) 5. All measured with "git show --no-notes --no-patch <commit> | wc", because I was lazy.