Re: [git 2.26] stat counts reported by commit and log are different

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 09, 2020 at 02:58:25PM -0700, Norbert Kiesel wrote:

> Thanks for the explanation. I still wonder why break-detection is by
> default enabled for commit but disabled for log.  Is there any
> rationale for this?

It's enabled for commit, because it was enable for git-status, via
3eb2a15eb3 (builtin-commit: make summary output consistent with status,
2007-12-16). It was enabled for status because it makes rename detection
more accurate; see the discussion at the end of this thread:

  https://lore.kernel.org/git/20071121171235.GA32233@xxxxxxxxxxxxxxxxxxxxx/

That tells us why it _is_ enable for commit, but not why it's not for
log. Traditionally rename detection was _not_ enabled by default there.
Because it can be expensive, we were quicker to enable it for
single-diff commands like commit (as opposed to log, which is doing a
bunch of diffs).

Much later, in 5404c116aa (diff: activate diff.renames by default,
2016-02-25), we turned on renames by default for git-log. But as far as
I recall, nobody gave much thought to turning on break detection at the
same time.

It might make sense to do so (and/or to make it possible to enable it by
config like we did for years with diff.renames). But it definitely is
way more expensive. For a tree-only diff, we don't usually have to look
at most blobs at all. Even with rename detection, we only have to look
at inexact rename candidates. But break detection must look at every
single modified file (timings are on git.git):

  [no renames at all, pretty fast]
  $ time git log --no-renames --raw >/dev/null
  real	0m2.231s
  user	0m2.203s
  sys	0m0.028s

  [adding in renames isn't very expensive]
  $ time git log -M --raw >/dev/null
  real	0m2.284s
  user	0m2.224s
  sys	0m0.060s

  [but break detection is]
  $ git log -B -M --raw >/dev/null
  real	0m33.377s
  user	0m32.942s
  sys	0m0.424s

A more fair comparison would actually generate content-level diffs,
which need to look in the blobs, like:

  $ time git log -M -p >/dev/null
  real	0m23.287s
  user	0m22.854s
  sys	0m0.432s

  $ time git log -B -M -p >/dev/null
  real	0m49.763s
  user	0m49.282s
  sys	0m0.480s

So not quite as bad percentage-wise, but still pretty expensive. And for
not a huge benefit. There are ~261 impacted commits. You can see a
recent example with:

  git show -B -M --stat --summary ce6521e44

where we find that most of builtin/fmt-merge-msg.c was moved to
fmt-merge-msg.c. It's nice, but it's expensive enough that it probably
shouldn't be the default.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux