Re: [RFC PATCH 0/4] move pruned objects to a separate repository

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 29 2022, Taylor Blau wrote:

> Now that cruft packs are available in v2.37.0, here is an interesting
> application of that new feature to enable a two-phase object pruning
> approach.
>
> This came out of a discussion within GitHub about ways we could support
> storing a set of pruned objects in "limbo" so that they were not
> accessible from the repository which pruned them, but instead stored in
> a cruft pack in a separate repository which lists the original one as an
> alternate.
>
> This makes it possible to take the collection of all pruned objects and
> store them in a cruft pack in a separate repository. This repository
> (which I have been referring to as the "expired.git") can then be used
> as a donor repository for any missing objects (like the ones described
> by the race in [1]).
> [...]
> [1]: https://lore.kernel.org/git/YryF+vkosJOXf+mQ@nand.local/

I think the best description of that race on-list is this by Jeff King,
if so I think it would be nice to work it into a commit message (for
4/4):

	https://public-inbox.org/git/20190319001829.GL29661@xxxxxxxxxxxxxxxxxxxxx/

Downthread of that, starting at:

	https://public-inbox.org/git/878svjj4t5.fsf@xxxxxxxxxxxxxxxxxxx/

I describe a proposed mechanism to address the race condition, which
seems to me be functionally the same as parts of what you're proposing
here. I.e. the "limbo" here being the same as the proposed "gc
quarantine".

The main difference being one that this RFC leaves on the table, which
is how you'd get these objects back into the non-cruft repository once
they're erroneously/racily expired. I imagined that we'd add it as a
special alternate, read it last, and make the object reading code aware
that any object needed from such an alternate is one that we'd need to
COR over to our primary repository:

	https://public-inbox.org/git/8736lnxlig.fsf@xxxxxxxxxxxxxxxxxxx/

Whereas it seems like you're imagining just having the "cruft pack"
repository around so that a support engineer can manually recover from
corruption, or have some other out-of-tree mechanism not part of this
RFC to (semi-?)automate that step.

If you haven't it would be nice if you could read that thread & see if
what I'm describing there is essentially a superset of what you have
here, and if any of the concerns Jeff King brought up there are ones you
think apply here.

Particularly as there was a reference to an off-list (presumably at
GitHub) discussion with Michael Haggerty about these sorts of races. I
don't know if either Jeff or Michael were involved in the discussions
you had.

I think that the mechanism I proposed there was subtly different from
what Jeff was concerned about being racy, but that thread was left
hanging as the last reply is from me trying to clarify that point.

So maybe I'm wrong, but I think if that was the case you'd also be wrong
about this approach being viable, so it would be nice to clear that up
:)

I'd also be very interested to know if you have anything like my
proposed auto-healing via a special alternate planned.  I think that
would allow aggressive pruning of live repositories not by fixing our
underlying race conditions, but by "leaning into" them as it were.

I.e. we'd race even more, but as we could always auto-heal by "no, I'll
actually need that" COR-ing the relevant object(s) back from the "gc
quarantine" (or your "cruft repository") that would be OK.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux