Re: [PATCH] gc: do not warn about too many loose objects

Jonathan Nieder <jrnieder@xxxxxxxxx> · Mon, 16 Jul 2018 13:37:53 -0700

Jeff King wrote:
> On Mon, Jul 16, 2018 at 12:54:31PM -0700, Jonathan Nieder wrote:

>> Even restricting attention to the warning, not the exit code, I think
>> you are saying the current behavior is acceptable.  I do not believe
>> it is.
>
> I didn't say that at all. The current situation sucks,

Thanks for clarifying.  That helps.

>                                                        and I think the
> right solution is to pack the unreachable objects instead of exploding
> them.

That seems like a huge widening in scope relative to what this
original patch tackled.  I'm not aware of a way to do this without
breaking compatibility with the current (broken) race prevention.  If
you're saying that breaking compatibility in that way is okay with
you, then great!

[...]
>> I think you are saying Jonathan's patch makes the behavior
>> worse, and I'm not seeing it.  Can you describe an example user
>> interaction where the current behavior would be better than the
>> behavior after Jonathan's patch?  That might help make this discussion
>> more concrete.
>
> It makes it worse because there is nothing to throttle a "gc --auto"
> that is making no progress.
[...]
> With the current code, that produces a bunch of annoying warnings for
> every commit ("I couldn't gc because the last gc reported a warning").
> But after Jonathan's patch, every single commit will do a full gc of the
> repository. In this tiny repository that's relatively quick, but in a
> real repo it's a ton of CPU and I/O, all for nothing.

I see.  Do I understand correctly that if we find a way to print the
warning but not error out, that would be good enough for you?

[...]
>> Have you looked over the discussion in "Loose objects and unreachable
>> objects" in Documentation/technical/hash-function-transition.txt?  Do
>> you have thoughts on it (preferrably in a separate thread)?
>
> It seems to propose putting the unreachable objects into a pack. Which
> yes, I absolutely agree with (as I thought I'd been saying in every
> single email in this thread).

I figured you were proposing something like
https://public-inbox.org/git/20180113100734.GA30698@xxxxxxxxxxxxxxxxxxxxx/,
which is still racy (because it does not handle the freshening in a safe
way).

[...]
>           Even if we were going to remove this message to help the
> daemonized case, I think we'd want to retain it for the non-daemon case.

Interesting.  That should be doable, e.g. following the approach
described below.

[...]
>> A simple way to do that without changing formats is to truncate the
>> file when exiting with status 0.
>
> That's a different behavior than what we do now (and what was suggested
> earlier), in that it assumes that anything written to stderr is OK for
> gc to hide from the user if the process exits with code zero.
>
> That's probably OK in most cases, though I wonder if there are corner
> cases. For example, if you have a corrupt ref, we used to say "error:
> refs/heads/foo does not point to a valid object!" but otherwise ignore
> it. These days git-gc sets REF_PARANOIA=1, so we'll actually barf on a
> corrupt ref. But I wonder if there are other cases lurking.

What decides it for me is that the user did not invoke "git gc --auto"
explicitly, so anything "git gc --auto" prints is tangential to what
the user was trying to do.  If the gc failed, that is worth telling
them, but if e.g. it encountered a disk I/O error reading and
succeeded on retry (to make up a fake example), then that's likely
worth logging to syslog but it's not something the user asked to be
directly informed about.

Thanks,
Jonathan