Re: Keeping unreachable objects in a separate pack instead of loose?

Nicolas Pitre <nico@xxxxxxxxxxx> · Tue, 12 Jun 2012 13:55:22 -0400 (EDT)

On Tue, 12 Jun 2012, Shawn Pearce wrote:

> On Tue, Jun 12, 2012 at 10:32 AM, Jeff King <peff@xxxxxxxx> wrote:
> > On Tue, Jun 12, 2012 at 01:30:07PM -0400, Nicolas Pitre wrote:
> >
> >> > > To make it "safe", the cruft packs would have to be searchable for
> >> > > object retrieval, but not during object creation.  That nuance would
> >> > > affect the core code in subtle ways and I'm not sure if that would be
> >> > > worth it ... just for the safe handling of cruft.
> >> >
> >> > Why is that? If you do a "repack -Ad", then any referenced objects will
> >> > have been retrieved and put into the new all-in-one pack. At that point,
> >> > by deleting the cruft pack, you are guaranteed to be deleting only
> >> > objects that are either unreferenced, or are duplicated in another pack.
> >>
> >> Now what if you fetch and a bunch of objects are already found in your
> >> cruft pack?  Right now, we search for the existence of any object before
> >> creating them, and if the cruft packs are searchable then such objects
> >> won't get uncruftified.
> >
> > Then those objects will remain in the cruft pack. Which is why, as I
> > said, it is not generally safe to just delete a cruft pack. However,
> > when you do a full repack, those objects will be copied into the new
> > pack (because they are referenced). Which is why I am claiming that it
> > is safe to remove cruft packs at that point.
> 
> But there is a race condition with a concurrent fetch and a concurrent
> repack. If that fetch needs those cruft objects, and sees them in the
> cruft pack, and the repack sees the references before the fetch, the
> repacker might delete things the fetch is about to reference and that
> will leave you with a corrupt repository.
> 
> I think we already have this race condition with loose unreachable
> objects whose mtimes are older than 2 weeks; they are removed by prune
> but may have just become reachable by a concurrent fetch that doesn't
> overwrite them because they already exist, and doesn't update the
> mtime because they aren't writable.

Splat!

Nicolas