Re: Preserve/Prune Old Pack Files

Jeff King <peff@xxxxxxxx> · Tue, 10 Jan 2017 04:14:26 -0500

On Mon, Jan 09, 2017 at 09:17:56AM -0700, Martin Fick wrote:

> > I suspect the name-change will break a few tools that you
> > might want to use to look at a preserved pack (like
> > verify-pack).  I know that's not your primary use case,
> > but it seems plausible that somebody may one day want to
> > use a preserved pack to try to recover from corruption. I
> > think "git index-pack --stdin
> > <objects/packs/preserved/pack-123.old-pack" could always
> > be a last-resort for re-admitting the objects to the
> > repository.
> 
> or even a simple manual rename/move back to its orginal 
> place?

Yes, that would work. There's not a tool to do it, but it's a fairly
straightforward transformation.

> [loose objects]
> Where would you suggest we store those?  Maybe under 
> ".git/objects/preserved/<xx>/<sha1>"?  Do they need to be 
> renamed also somehow to avoid a find?

It would make sense to me to have a single "preserved" root, with
"<xx>/<sha1>.old" and "packs/pack-<sha1>.old-pack" together under it.

You could also move the objects out of objects/ entirely. Say, to
".git/preserved-objects" or something. Then you could probably do away
with the filename munging altogether, and "restoring" an object or pack
would be a simple "mv" or "cp" (or you could even add preserved-objects
to $GIT_ALTERNATE_OBJECT_DIRECTORIES if you wanted to do a single
operation looking at both sets).

That's all outside the scope of your original purpose (which I think was
just to keep the files _somewhere_ so that the open descriptor stays
valid on NFS). But maybe it would make other related things more
convenient. I dunno. I'm just speaking off the top of my head.

> > That's _way_ more complicated than your problem, and as I
> > said, I do not have a finished solution. But it seems
> > like they touch on a similar concept (a post-delete
> > holding area for objects). So I thought I'd mention it in
> > case if spurs any brilliance.
> 
> I agree, this is a problem I have wanted to solve also.  I 
> think having a "preserved" directory does open the door to 
> such "recovery" solutions, although I think you would 
> actually want to modify the many read code paths to fall 
> back to looking at the preserved area and performing 
> immediate "recovery" of the pack file if it ends up being 
> needed.

In my (admittedly not very concrete) plan, the read code paths
_wouldn't_ know to look in the preserved area. It would be up to the
repacking process to rollback in case of a race. That does open a period
(between the faux delete and the rollback) where readers may be broken.
But that's much better than the state today, which is that the readers
are broken, and that breakage persists forever.

But there may be other better ways of doing it.  What we're really
talking about is a transactional system where neither side locks (or at
least not for an appreciable amount of time), and one side is capable of
falling back and modifying its operation when there's a relevant race.
There's probably some research in this area and some standard solutions,
but it's not an area I'm overly familiar with (and building any solution
on top of POSIX filesystem semantics adds an extra challenge).

> That's a lot of work, but having the packs (and 
> eventually the loose objects) preserved into a location 
> where no new references will be built to depend on them is 
> likely the first step.  Does the name "preserved" do well for 
> that use case also, or would there be some better name, what 
> would a transactional system call them?

I wasn't going to bikeshed, but since you ask...:)

"preserved" to me sounds like something we'd be keeping forever. These
objects are more in a "pending delete" state, or a purgatory. Maybe
something along those lines would be more appropriate.

-Peff