Re: [RFC] git gc "--prune=now" semantics considered harmful

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, May 27, 2018 at 08:31:14AM +0900, Junio C Hamano wrote:

> > So I actually would much prefer that foir git gc, "--prune=now" means
> >
> >  (a) "now"
> >
> >  (b) now at the _start_ of the "git gc" operation, not the time at
> >      the _end_ of the operation when we've already spent a minute or
> >      two doing repacking and are now doing the final pruning.
> >
> > anyway, with that explanation in mind, I'm appending a patch that is 
> > pretty small and does that. It's a bit hacky, but I think it still makes 
> > sense.
> >
> > Comments?
> 
> Closing the possiblity of racing a running "gc" and new object
> creation like the above generally makes sense, I would think,
> whether the creation is due to 'pull/fetch', 'add', or even 'push'.

I think Linus's suggestion is an obvious improvement. It does shorten
the window for confusing things to happen, and I think it makes things
much easier to reason about if all parts of the gc are using the same
timestamp.

Regarding the implementation:

> > -	if (prune_expire && parse_expiry_date(prune_expire, &dummy))
> > -		die(_("failed to parse prune expiry value %s"), prune_expire);
> > +	if (prune_expire) {
> > +		if (!strcmp(prune_expire, "now"))
> > +			prune_expire = show_date(time(NULL), 0, DATE_MODE(ISO8601));
> > +		if (parse_expiry_date(prune_expire, &dummy))
> > +			die(_("failed to parse prune expiry value %s"), prune_expire);
> > +	}

We'd also accept relative times like "5.minutes.ago" (in fact, the
default is a relative 2.weeks.ago, though it's long enough that the
difference between "2 weeks" and "2 weeks plus 5 minutes" may not matter
much). So we probably ought to just normalize _everything_ without even
bothering to match "now". It's a noop for non-relative times, but that's
OK.

> I however have to wonder if there are opposite "oops" end-user
> operation we also need to worry about, i.e. we are doing a large-ish
> fetch, and get bored and run a gc fron another terminal.  Perhaps
> *that* is a bit too stupid to worry about?  Auto-gc deliberately
> does not use 'now' because it wants to leave a grace period to avoid
> exactly that kind of race.

There are still possibilities for a race, even with the grace period.
You can have an unreferenced 2-week-old object sitting on disk, and
somebody can choose to reference it at the same time as we are pruning
it. My freshness patches from a few years ago made things a bit better:

  - when we optimize out the write of an existing object, we now at
    least update its timestamp

  - we consider non-fresh objects reachable from fresh ones to also be
    fresh

But fundamentally none of this is atomic. You can have an old tree, and
while you're pruning somebody writes a new commit referencing it and
sticks that in a ref. It's more common if your grace period is "now",
but it can still happen with any grace period.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux