Re: [RFC PATCH 0/4] move pruned objects to a separate repository

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Taylor Blau <me@xxxxxxxxxxxx> writes:
> We don't usually do pruning GC's except during manual intervention or
> upon request through a support ticket. But when we do it is often
> infeasible to lock the entire network's push traffic and reference
> updates. So it is not an unheard of event to encounter the race that I
> described above.
> 
> The idea is that, at least for non-sensitive pruning, we would move the
> pruned objects to a separate repository and hold them there until we
> could run `git fsck` on the repository after pruning and verify that the
> repository is intact. If it is, then the expired.git repository can be
> emptied, too, permanently removing the pruned objects. If not, the
> expired.git repository then becomes a donor for the missing objects,
> which are used to heal the corrupt main repository. Once *that* is done,
> and fsck comes back clean, then the expired.git repository can be
> removed.

Thanks for the information. Presumably, during such pruning GCs (because
manual intervention was needed or because of a support ticket), you
would use a cruft expiration of "now", not something like "14 days ago".
If so, more on this specific case later...

> > I think there is at least one more alternative that should be
> > considered, though: since the cruft pack is unlikely to have its objects
> > "resurrected" (since the reason why they're there is because they are
> > unreachable), it is likely that the objects that are pruned are exactly
> > the same as those in the craft pack. So it would be more efficient to
> > just unconditionally rename the cruft pack to the backup destination.
> 
> This isn't quite right. The contents that are written into the
> expired.git repository is everything that *didn't* end up in the cruft
> pack.

I reread what I wrote and realized that I didn't give a good description
of what I was thinking. So hopefully this is a better one: I was
thinking of the case in which GC is regularly run on a repo, say, every
14 days with a 14-day expiration time. So on the first run you would
have:

  a1.pack      a2.pack (+ .mtime)
  Reachable    Unreachable (cruft pack)

and on the second run, assuming this patch set is in effect:

  b1.pack      b2.pack (+ .mtime)          expired.git/b3.pack
  Reachable    Unreachable (cruft pack)    In expired.git

and my idea was that it is very likely that a2.pack and
expired.git/b3.pack are the same, so I thought that a feature in which
cruft packs could be moved instead of deleted would be more efficient.

Going back to the specific case at the start of this email. I now see
that my idea wouldn't work in that specific case, because you would want
to generate expired.git packs from a repository that is unlikely to have
cruft packs (or, at least, it is unlikely that the existing cruft packs
match exactly what you would write in expired.git).

If it is likely that cruft pack(s) would only be written in one place
(whether in the same repository or in expired.git, but not both),
another design option would be to be able to tell Git where to write
cruft packs, but not give Git the ability to write cruft packs both to
the same repository and expired.git. (Unless in your use case, Taylor,
you envision that you would occasionally need to write cruft packs in
both places.) That would simplify the UI and the code a bit, but not by
much (this patch set, as written, is already elegantly written), so I
don't feel too strongly about it and I would be happy with the current
pattern.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux