Re: gigantic commit messages, was Re: Git Bug Report: out of memory using git tag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 2, 2022 at 7:43 AM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
>
> On Wed, Nov 02 2022, Jeff King wrote:
>
> > On Wed, Nov 02, 2022 at 01:14:59AM -0700, Elijah Newren wrote:
> >
> >> On Wed, Nov 2, 2022 at 12:51 AM Jeff King <peff@xxxxxxxx> wrote:
> >> >
> >> > Here are patches which fix them both. I may be setting a new record for
> >> > the ratio of commit message lines to changed code
> >>
> >> It looks like the first patch is 72 lines of commit message for a
> >> one-line fix, and the second patch is 61 lines of commit message for a
> >> two line fix.
> >>
> >> I don't know what the record ratio is, but it's at least 96[1], so
> >> clearly you'll need to figure out how to pad your first commit message
> >> with at least another 25 lines before this series can be accepted.
> >> ;-)
> >
> > Well, if we want to start digging things up... ;)
> >
> > Try this:
> >
> >   git log --no-merges --no-renames --format='%H %B' -z --numstat '*.c' |
> >   perl -0ne '
> >     chomp;
> >     if (s/^([0-9a-f]{40}) //) {
> >       if (defined $commit && $diff) {
> >         my $ratio = $body / $diff;
> >         print "$ratio $body $diff $commit\n";
> >       }
> >       $commit = $1;
> >       $body = () = /\n/g;
> >       $diff = 0;
> >     } elsif (/^\s*(\d+)\t/) {
> >       # this counts only added lines, under the assumption that
> >       # small commits generally remove/add in proportion. Of course
> >       # ones that _only_ remove lines have infinite ratios.
> >       $diff += $1;
> >     } else {
> >       die "confusing record: $_\n";
> >     }
> >   ' |
> >   sort -rn |
> >   head
> >
> > which shows there are a few in the 100's. Pipe through:
> >
> >   awk '{print $4}' |
> >   git log --stdin --no-walk=unsorted --stat
> >
> > for a nicer view. I'm rejecting the top one on the grounds that it's
> > mostly cut-and-paste output, and also that #2 is mine. ;)
>
> I think that '*.c' is cheating, if anything I should be getting more
> points when you remove that, as I've been over explaining
> adding/removing a compiler flag or something. At least your #2 is tricky
> C code :)
>
> I haven't bothered to do this, but I think if you --word-diff
> --word-diff-regex=. and parse the resulting diff you'd get "better"
> results.
>
> Or, for better & similar (but not the same): compute the levenshtein
> distance of the pre- and post-image, and compute edit distance to commit
> message length.
>
> I haven't done that, but just from eyeballing it I think [1] beats your
> [2] by that criteria. Per:
>
>         $ perl -MText::Levenshtein=distance -wE 'say distance @ARGV' int unsigned
>         6
>         $ perl -MText::Levenshtein=distance -wE 'say distance @ARGV' "" _lf
>         3
>
> It should get 2x the score v.s. yours, but yours is <2x the
> words/characters.
>
> (Edit: But see [4] below)
>
> There's also e.g. my [3] that's fairly high in the running per your
> "only added lines". But I think it shows the perils of doing that,
> i.e. in general I don't see why you'd omit deletions, that commit
> message is certainly spending most of its time talking about why the
> deletion of the code at hand is OK.
>
> Once you count deletions it'll get *way* down the list, as it's 11
> deleted lines, 1 added.
>
> Hrm, I take some of the above back, I think [4] might be the winner.
> That's just an edit distance of 1, so it's around 2x the commit message
> length of yours if we adjust for your score of 6. (~2.5 by
> characters)[5].
>
> 1. 356c4732950 (credential: treat CR/LF as line endings in the
>    credential protocol, 2020-10-03)
> 2. aec0bba106d (config: work around gcc-10 -Wstringop-overflow warning,
>    2020-08-04)
> 3. f97fe358576 (pickaxe -G: don't special-case create/delete,
>    2021-04-12)
> 4. c58bebd4c67 (ci: update Cirrus-CI image to FreeBSD 12.3, 2022-05-25)
> 5. All measured with "git show --no-notes --no-patch <commit> | wc",
>    because I was lazy.

Hehe, my offhand joke started a contest over the whimsical question of
who's the most long-winded.  I think my work here is done.  :-)




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux