On Wed, Nov 2, 2022 at 7:43 AM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > > On Wed, Nov 02 2022, Jeff King wrote: > > > On Wed, Nov 02, 2022 at 01:14:59AM -0700, Elijah Newren wrote: > > > >> On Wed, Nov 2, 2022 at 12:51 AM Jeff King <peff@xxxxxxxx> wrote: > >> > > >> > Here are patches which fix them both. I may be setting a new record for > >> > the ratio of commit message lines to changed code > >> > >> It looks like the first patch is 72 lines of commit message for a > >> one-line fix, and the second patch is 61 lines of commit message for a > >> two line fix. > >> > >> I don't know what the record ratio is, but it's at least 96[1], so > >> clearly you'll need to figure out how to pad your first commit message > >> with at least another 25 lines before this series can be accepted. > >> ;-) > > > > Well, if we want to start digging things up... ;) > > > > Try this: > > > > git log --no-merges --no-renames --format='%H %B' -z --numstat '*.c' | > > perl -0ne ' > > chomp; > > if (s/^([0-9a-f]{40}) //) { > > if (defined $commit && $diff) { > > my $ratio = $body / $diff; > > print "$ratio $body $diff $commit\n"; > > } > > $commit = $1; > > $body = () = /\n/g; > > $diff = 0; > > } elsif (/^\s*(\d+)\t/) { > > # this counts only added lines, under the assumption that > > # small commits generally remove/add in proportion. Of course > > # ones that _only_ remove lines have infinite ratios. > > $diff += $1; > > } else { > > die "confusing record: $_\n"; > > } > > ' | > > sort -rn | > > head > > > > which shows there are a few in the 100's. Pipe through: > > > > awk '{print $4}' | > > git log --stdin --no-walk=unsorted --stat > > > > for a nicer view. I'm rejecting the top one on the grounds that it's > > mostly cut-and-paste output, and also that #2 is mine. ;) > > I think that '*.c' is cheating, if anything I should be getting more > points when you remove that, as I've been over explaining > adding/removing a compiler flag or something. At least your #2 is tricky > C code :) > > I haven't bothered to do this, but I think if you --word-diff > --word-diff-regex=. and parse the resulting diff you'd get "better" > results. > > Or, for better & similar (but not the same): compute the levenshtein > distance of the pre- and post-image, and compute edit distance to commit > message length. > > I haven't done that, but just from eyeballing it I think [1] beats your > [2] by that criteria. Per: > > $ perl -MText::Levenshtein=distance -wE 'say distance @ARGV' int unsigned > 6 > $ perl -MText::Levenshtein=distance -wE 'say distance @ARGV' "" _lf > 3 > > It should get 2x the score v.s. yours, but yours is <2x the > words/characters. > > (Edit: But see [4] below) > > There's also e.g. my [3] that's fairly high in the running per your > "only added lines". But I think it shows the perils of doing that, > i.e. in general I don't see why you'd omit deletions, that commit > message is certainly spending most of its time talking about why the > deletion of the code at hand is OK. > > Once you count deletions it'll get *way* down the list, as it's 11 > deleted lines, 1 added. > > Hrm, I take some of the above back, I think [4] might be the winner. > That's just an edit distance of 1, so it's around 2x the commit message > length of yours if we adjust for your score of 6. (~2.5 by > characters)[5]. > > 1. 356c4732950 (credential: treat CR/LF as line endings in the > credential protocol, 2020-10-03) > 2. aec0bba106d (config: work around gcc-10 -Wstringop-overflow warning, > 2020-08-04) > 3. f97fe358576 (pickaxe -G: don't special-case create/delete, > 2021-04-12) > 4. c58bebd4c67 (ci: update Cirrus-CI image to FreeBSD 12.3, 2022-05-25) > 5. All measured with "git show --no-notes --no-patch <commit> | wc", > because I was lazy. Hehe, my offhand joke started a contest over the whimsical question of who's the most long-winded. I think my work here is done. :-)