Re: [PATCH] gc: do not warn about too many loose objects

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Tue, 17 Jul 2018 17:24:50 +0200

On Tue, Jul 17 2018, Jonathan Nieder wrote:

> Hi Ævar,
>
> Ævar Arnfjörð Bjarmason wrote:
>> On Tue, Jul 17 2018, Jonathan Nieder wrote:
>
>>> That suggests a possible improvement.  If all callers should be
>>> ignoring the exit code, can we make the exit code in daemonize mode
>>> unconditionally zero in this kind of case?
>>
>> That doesn't make sense to me. Just because git itself is happily
>> ignoring the exit code I don't think that should mean there shouldn't be
>> a meaningful exit code. Why don't you just ignore the exit code in the
>> repo tool as well?
>
> I don't maintain the repo tool.  I am just trying to improve git to
> make some users' lives better.
>
> repo worked fine for years, until gc's daemonized mode broke it.  I
> don't think it makes sense for us to declare that it has been holding
> git gc wrong for all that time before then.

Before the sequence of commits noted at the bottom of my
https://public-inbox.org/git/87inc89j38.fsf@xxxxxxxxxxxxxxxxxxx/ plus
a831c06a2b ("gc: ignore old gc.log files", 2017-02-10) we wouldn't do
that, that's true.

We'd just happily run GC again pointlessly even though it wasn't going
to accomplish anything, in cases where you really did have ~>6700 loose
objects that weren't going to be pruned.

I don't think it makes sense to revert those patches and go back to the
much more naïve (and user waiting there twiddling his thumbs while it
runs) method, but I also don't think making no distinction between the
following states:

 1. gc --auto has nothing to do
 2. gc --auto has something to do, will fork and try to do it
 3. gc --auto has something to do, but notices that gc has been failing
    before and can't do anything now.

I think #3 should exit with non-zero.

Yes, before this whole backgrounding etc. we wouldn't have exited with
non-zero either, we'd just print a thing to STDERR.

But with backgrounding we implicitly introduced a new state, which is we
won't do *any* maintenance at all, including consolidating packfiles
(very important for servers), so I think it makes sense to exit with
non-zero since gc can't run at all.

>> Now e.g. automation I have to see if git-gc ---auto is having issues
>> can't just be 'git gc --auto || echo fail @ {hostname}' across a fleet
>> of servers, but will need to start caring if stderr was emitted to.
>
> Thanks for bringing this up.  The thing is, the current exit code is
> not useful for that purpose.  It doesn't report a failure until the
> *next* `git gc --auto` run, when it is too late to be useful.
>
> See the commit message on the proposed patch
> https://public-inbox.org/git/20180717065740.GD177907@xxxxxxxxxxxxxxxxxxxxxxxxx/
> for more on that subject.

Right. I know. What I mean is now I can (and do) use it to run 'git gc
--auto' across our server fleet and see whether I have any of #3, or
whether it's all #1 or #2. If there's nothing to do in #1 that's fine,
and it just so happens that I'll run gc due to #2 that's also fine, but
I'd like to see if gc really is stuck.

This of course relies on them having other users / scripts doing normal
git commands which would trigger previous 'git gc --auto' runs.

>> I don't care if we e.g. have a 'git gc --auto --exit-code' similar to
>> what git-diff does, but it doesn't make sense to me that we *know* we
>> can't background the gc due to a previous error and then always return
>> 0. Having to parse STDERR to see if it *really* failed is un-unixy,
>> let's use exit codes. That's what they're for.
>
> Do you know of anyone trying to use the exit code from gc --auto in
> this way?
>
> It sounds to me like for what you're proposing, a separate
>
> 	git gc --auto --last-run-failed
>
> command that a tool could use to check the outcome from the previous
> gc --auto run without triggering a new one would be best.

Yeah this is admittedly a bit of a poweruser thing I'm doing, so I don't
mind if it's hidden behind something like that in principle, but it
seems wrong to exit with zero in this (#3) case:

    $ git gc --auto; echo $?
    Auto packing the repository in background for optimum performance.
    See "git help gc" for manual housekeeping.
    error: The last gc run reported the following. Please correct the root cause
    and remove .git/gc.log.
    Automatic cleanup will not be performed until the file is removed.

    warning: There are too many unreachable loose objects; run 'git prune' to remove them.

    255

As a previous (good) patch in this series notes that shouldn't be 255,
let's fix that, but I don't see how emitting an "error" and "warning"
saying we're broken and can't gc at all here in conjunction with exiting
with zero makes sense.

> [...]
>> I think you're conflating two things here in a way that (if I'm reading
>> this correctly) produces a pretty bad regression for users.
>
> The patch doesn't touch the "ratelimiting" behavior at all, so I'm not
> sure what I'm doing to regress users.

Sorry about being unclear again, this is not a comment on this patch,
but your paragraph beginning with "To solve[...]", i.e. "this patch
doesn't do X, but in the future maybe we'll...". I'm pointing out
potential caveats with that, I realize you have not posted an
implementation of that yet.

> [...]
>>> To solve (3), we could
>>> introduce a gc.lastrun file that is touched whenever "gc --auto" runs
>>> successfully and make "gc --auto" a no-op for a while after each run.
>>
>> This would work around my concern with b) above in most cases by
>> introducing a purely time-based rate limit, but I'm uneasy about this
>> change in git-gc semantics.
> [..]
>> With these proposed semantics we'd skip a needed GC (potentially for
>> days, depending on the default) just because we recently ran one.
>>
>> In many environments, such as on busy servers, we could have tens of
>> thousands of packs by this point, since this facility would (presumably)
>> bypass both gc.autoPackLimit and gc.auto in favor of only running gc at
>> a maximum of every N minutes, similarly in a local checkout I could have
>> a crapload of loose objects because I ran a big rebase or a
>> filter-branch on one of my branches.
>
> I think we all agree that getting rid of the hack that 'explodes'
> unreachable objects into loose objects is the best way forward, long
> term.
>
> Even in that future, some ratelimiting may be useful, though.  I also
> suspect that we're never going to achieve a perfect set of defaults
> that works well for both humans and servers, though we can try.

Having read this whole thing over again, and with some time to reflect
on it, I think the best way forward for now, in lieu of the bigger
solution of consolidating loose objects into a pack, is to split up this
special case of warning due to too many loose objects at the end, and
any other errors we may print during GC.

As noted above if our policy for gc.auto is legitimately exceeded, we
deal with that badly by stopping all gc, includin gc that would just run
due to too_many_packs().

Instead we should note that we had an error due to too many loose
objects last time, but still try to run if the too_many_packs()
condition is satisfied.

Now we're throwing the baby out with the bathwater and not consolidating
packs (for a default period of 2 weeks!) just because some operation
produced a lot of loose objects, and not writing bitmaps, commit graph
etc.

It would also be nice to expose via an exit code "do we need to gc?" and
"is gc failing here?" separately from the (currently working) "run gc or
report last error + code" mode we have with "gc --auto".