Re: 30min Script in git 2.7.4 takes 22+ hrs in git 2.9.3

Jeff King <peff@xxxxxxxx> · Thu, 27 Apr 2017 16:42:19 -0400

On Thu, Apr 27, 2017 at 04:09:56PM -0400, Jeff King wrote:

> On Thu, Apr 27, 2017 at 12:36:54PM -0400, Robert Stryker wrote:
> 
> > The problem:  the script takes 30 minutes for one environment
> > including git 2.7.4, and generates a repo of about 30mb.   When run by
> > a coworker using git 2.9.3, it takes 22+ hours and generates a 10gb
> > repo.
> > 
> > Clearly something here is very wrong. Either there's a pretty horrible
> > regression or my idea is a pretty bad one ;)
> 
> The large size makes me think that you're getting an auto-gc in the
> middle that is exploding the unreachable objects into loose storage.
> This can happen when objects are ready to be pruned, but Git holds on to
> them for a grace periods (2 weeks by default) as a precaution against
> simultaneous use.
> 
> Try doing:
> 
>   git config gc.auto 0
> 
> in the repositories before the slow step. Or alternatively, try:
> 
>   git config gc.pruneExpire now
> 
> which will continue to do the auto-gc, but throw away unreachable
> objects immediately.
> 
> Or alternatively, we're failing to run gc at all and just getting tons
> of loose objects that need packed. What does running "git gc --auto" say
> if you run it in the slow repository? Does it improve the disk space
> problem?

Fiddling with your script a bit, I have a suspect. Between your two
versions of git, we started disallowing merge of unrelated histories by
default[1]. Which is exactly what your script is doing:

  echo "Merge in the four rewritten projects, with generic commit messages"
  git pull --no-edit webtools.common.fproj     
  git pull --no-edit webtools.common           
  git pull --no-edit webtools.common.tests     
  git pull --no-edit webtools.common.snippets

If you run under "set -e", or just put "|| exit 1" after those, you'll
see that they fail with v2.9.3 and newer.

So what I think is happening is that we never create that shared
history, and then your per-tag work is building further on a nonsense
fake history. That has two implications:

  - as the divergent history in the shared repo gets bigger and bigger,
    the fetches have to do more and more work to try to find a common
    ancestor (but of course they'll never find one, because the two
    histories aren't related)

  - the divergent history racks up tons of unreachable objects, which
    auto-gc won't pack. After a while of the script running, you can see
    that auto-gc fails with "There are too many unreachable loose
    objects" after the pack. Due to the way background gc works these
    days, that blocks further auto-gc from running until the situation
    is resolved. And you just rack up tons of loose objects, which
    explains the disk usage.

Try adding "--allow-unrelated-histories" to your git-pull invocation.

-Peff

[1] See e379fdf34 (merge: refuse to create too cool a merge by default, 2016-03-18)