Re: [PATCH] gc: do not warn about too many loose objects

Jeff King <peff@xxxxxxxx> · Mon, 16 Jul 2018 17:09:38 -0400

On Mon, Jul 16, 2018 at 01:37:53PM -0700, Jonathan Nieder wrote:

> >                                                        and I think the
> > right solution is to pack the unreachable objects instead of exploding
> > them.
> 
> That seems like a huge widening in scope relative to what this
> original patch tackled.

I'm not at all saying that Jonathan needs to work on that. It's a big
topic that we've been putting off for years. I _am_ saying that I don't
think his patch as-is should be merged, as it makes the daemonized
behavior here worse.

> I'm not aware of a way to do this without
> breaking compatibility with the current (broken) race prevention.  If
> you're saying that breaking compatibility in that way is okay with
> you, then great!

It depends what you're trying to fix. If you're trying to fix auto-gc
outputting an annoying message (or repeatedly doing useless invocations
of pack-objects), you don't have to break compatibility at all. You just
have to put the objects into a pack instead of exploding them loose.
Worst case if you use an old implementation to run "git gc", it may
accidentally freshen a cruft pack's timestamp.

If you want to improve the raciness, then yes, I think you need stronger
rules around marking packs as cruft/garbage, and making operations that
want to use those objects copy them out to a non-cruft location (like
you describe in the hash transition document). There you risk corruption
if an old implementation fails to make the non-cruft copy. But that's
really no worse than we are today.

So I don't even really see it as a backwards-compatibility break. But
even if it were, I'd probably still be fine with it. This is a
local-repo thing, and in the absolute worst case we could bump
core.repositoryformatversion (though again, I don't even think that
would be necessary since it would degrade to behaving just as the
current code does).

> > With the current code, that produces a bunch of annoying warnings for
> > every commit ("I couldn't gc because the last gc reported a warning").
> > But after Jonathan's patch, every single commit will do a full gc of the
> > repository. In this tiny repository that's relatively quick, but in a
> > real repo it's a ton of CPU and I/O, all for nothing.
> 
> I see.  Do I understand correctly that if we find a way to print the
> warning but not error out, that would be good enough for you?

Yes. I thought that's what I proposed originally. ;)

The key thing is that the presence of the warning/log still suppress
the actual gc for the gc.logExpiry period.

> >> Have you looked over the discussion in "Loose objects and unreachable
> >> objects" in Documentation/technical/hash-function-transition.txt?  Do
> >> you have thoughts on it (preferrably in a separate thread)?
> >
> > It seems to propose putting the unreachable objects into a pack. Which
> > yes, I absolutely agree with (as I thought I'd been saying in every
> > single email in this thread).
> 
> I figured you were proposing something like
> https://public-inbox.org/git/20180113100734.GA30698@xxxxxxxxxxxxxxxxxxxxx/,
> which is still racy (because it does not handle the freshening in a safe
> way).

That message is just me telling somebody that their idea _won't_ work. ;)

But I assume you mean the general idea of putting things in a cruft
pack[1]. Yes, I was only suggesting going that far as a solution to the
auto-gc woes. Solving races is another issue entirely (though obviously
it may make sense to build on the single-pack work, if it exists).

[1] The best message discussion on that is I think:

      https://public-inbox.org/git/20170610080626.sjujpmgkli4muh7h@xxxxxxxxxxxxxxxxxxxxx/

    which I think I linked earlier, so possibly that is even what you
    meant. :)

> What decides it for me is that the user did not invoke "git gc --auto"
> explicitly, so anything "git gc --auto" prints is tangential to what
> the user was trying to do.  If the gc failed, that is worth telling
> them, but if e.g. it encountered a disk I/O error reading and
> succeeded on retry (to make up a fake example), then that's likely
> worth logging to syslog but it's not something the user asked to be
> directly informed about.

I'm not sure I agree. If a repack discovered that you had a bit
corruption on disk, but you happened to have another copy of the object
available that made the repack succeed, I'd like to know that I'm
getting bit corruptions, and earlier is better. I think we need to
surface that information to the user eventually, and I don't think we
can count on syslog being universally available.

By definition we're daemonized here, so we can't count on even having
access to the user's terminal. But it would make more sense to me if the
logic were:

  - at the end of a "gc --auto" run, gc should write a machine-readable
    indication of its exit code (either as a final line in the log file,
    or to a gc.exitcode file). The warnings/errors remain intermingled
    in the logfile.

  - on the _next_ run, if:

    - the exit code was non-zero, dump the log output to stderr and exit
      with the same code. This is the same as the current behavior (with
      the exception that we propagate the code instead of returning -1,
      which is a minor bonus).

    - the exit code was zero, report to the user synchronously that the
      last run found some oddities, and dump the log. Proceed with the
      rest of auto-gc (either exiting 0 if nothing to be done, or doing
      useful work).

      Optionally, delete the log after showing it so the user is not
      spammed with the warning on every commit. Though for the warning
      under discussion, this does mean that if they don't take any
      action, they'll end up running a useless gc every _other_ commit.
      So I'd probably keep the log and let it expire naturally after the
      1-day gc.logExpiry period.

-Peff