Re: 30min Script in git 2.7.4 takes 22+ hrs in git 2.9.3

Jeff King <peff@xxxxxxxx> · Thu, 27 Apr 2017 16:09:57 -0400

On Thu, Apr 27, 2017 at 12:36:54PM -0400, Robert Stryker wrote:

> The problem:  the script takes 30 minutes for one environment
> including git 2.7.4, and generates a repo of about 30mb.   When run by
> a coworker using git 2.9.3, it takes 22+ hours and generates a 10gb
> repo.
> 
> Clearly something here is very wrong. Either there's a pretty horrible
> regression or my idea is a pretty bad one ;)

The large size makes me think that you're getting an auto-gc in the
middle that is exploding the unreachable objects into loose storage.
This can happen when objects are ready to be pruned, but Git holds on to
them for a grace periods (2 weeks by default) as a precaution against
simultaneous use.

Try doing:

  git config gc.auto 0

in the repositories before the slow step. Or alternatively, try:

  git config gc.pruneExpire now

which will continue to do the auto-gc, but throw away unreachable
objects immediately.

Or alternatively, we're failing to run gc at all and just getting tons
of loose objects that need packed. What does running "git gc --auto" say
if you run it in the slow repository? Does it improve the disk space
problem?

Even if one of those helps, I'd still like to know why the gc behavior
changed between the two versions. The best way to do that is via
git-bisect.

You should be able to do:

  # make sure you can compile git from source
  git clone git://git.kernel.org/pub/scm/git/git.git
  cd git
  make

  git bisect start
  git bisect good v2.7.4
  git bisect bad v2.9.3

  # for each commit bisect dumps you at, run your test. The bin-wrappers
  # part is important, because it sets up the environment to run
  # sub-programs from the built version. And as pull is a shell script,
  # the problem is likely in a sub-program.
  /path/to/git/bin-wrappers/git pull ...

  # And then mark whether it was fast or slow. You obviously don't need
  # to run the program to completion; just enough to decide if it's fast
  # or slow (which might be better done by observing disk space rather
  # than timing).
  git bisect good ;# or "bad" if it was slow

It's going to be tedious even if it takes 30 minutes per iteration. It
might be worth trying to adjust the test case for smaller repos. :)

It may also be worth trying the test with the latest tip of "master".
v2.9.3 is several versions behind, and it's possible that something may
have been fixed since then (nothing comes immediately to mind, but it's
worth a shot).

-Peff