Re: missing objects -- prevention

Jeff King <peff@xxxxxxxx> · Sat, 12 Jan 2013 08:13:58 -0500

On Sat, Jan 12, 2013 at 06:39:52AM +0530, Sitaram Chamarty wrote:

> >   1. The repo has a ref R pointing at commit X.
> >
> >   2. A user starts a push to another ref, Q, of commit Y that builds on
> >      X. Git advertises ref R, so the sender knows they do not need to
> >      send X, but only Y. The user then proceeds to send the packfile
> >      (which might take a very long time).
> >
> >   3. Meanwhile, another user deletes ref R. X becomes unreferenced.
> 
> The gitolite logs show that no deletion of refs has happened.

To be pedantic, step 3 could also be rewinding R to a commit before X.
Anything that causes X to become unreferenced.

> > There is a race with simultaneously deleting and packing refs. It
> > doesn't cause object db corruption, but it will cause refs to "rewind"
> > back to their packed versions. I have seen that one in practice (though
> > relatively rare). I fixed it in b3f1280, which is not yet in any
> > released version.
> 
> This is for the packed-refs file right?  And it could result in a ref
> getting deleted right?

Yes, if the ref was not previously packed, it could result in the ref
being deleted entirely.

> I said above that the gitolite logs say no ref was deleted.  What if
> the ref "deletion" happened because of this race, making the rest of
> your 4-step scenario above possible?

It's possible. I do want to highlight how unlikely it is, though.

> > up in the middle, or fsck rejects the pack). We have historically left
> 
> fsck... you mean if I had 'receive.fsckObjects' true, right?  I don't.
>  Should I?  Would it help this overall situation?  As I understand it,
> thats only about the internals of each object to check corruption, and
> cannot detect a *missing* object on the local object store.

Right, I meant if you have receive.fsckObjects on. It won't help this
situation at all, as we already do a connectivity check separate from
the fsck. But I do recommend it in general, just because it helps catch
bad objects before they gets disseminated to a wider audience (at which
point it is often infeasible to rewind history). And it has found git
bugs (e.g., null sha1s in tree entries).

> > At GitHub, we've taken to just cleaning them up aggressively (I think
> > after an hour), though I am tempted to put in an optional signal/atexit
> 
> OK; I'll do the same then.  I suppose a cron job is the best way; I
> didn't find any config for expiring these files.

If you run "git prune --expire=1.hour.ago", it should prune stale
tmp_pack_* files more than an hour old. But you may not be comfortable
with such a short expiration for the objects themselves. :)

> Thanks again for your help.  I'm going to treat it (for now) as a
> disk/fs error after hearing from you about the other possibility I
> mentioned above, although I find it hard to believe one repo can be
> hit buy *two* races occurring together!

Yeah, the race seems pretty unlikely (though it could be just the one
race with a rewind). As I said, I haven't actually ever seen it in
practice. In my experience, though, disk/fs issues do not manifest as
just missing objects, but as corrupted packfiles (e.g., the packfile
directory entry ends up pointing to the wrong inode, which is easy to
see because the inode's content is actually a reflog). And then of
course with the packfile unreadable, you have missing objects. But YMMV,
depending on the fs and what's happened to the machine to cause the fs
problem.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html