On Sat, Jan 12, 2013 at 06:39:52AM +0530, Sitaram Chamarty wrote: > > 1. The repo has a ref R pointing at commit X. > > > > 2. A user starts a push to another ref, Q, of commit Y that builds on > > X. Git advertises ref R, so the sender knows they do not need to > > send X, but only Y. The user then proceeds to send the packfile > > (which might take a very long time). > > > > 3. Meanwhile, another user deletes ref R. X becomes unreferenced. > > The gitolite logs show that no deletion of refs has happened. To be pedantic, step 3 could also be rewinding R to a commit before X. Anything that causes X to become unreferenced. > > There is a race with simultaneously deleting and packing refs. It > > doesn't cause object db corruption, but it will cause refs to "rewind" > > back to their packed versions. I have seen that one in practice (though > > relatively rare). I fixed it in b3f1280, which is not yet in any > > released version. > > This is for the packed-refs file right? And it could result in a ref > getting deleted right? Yes, if the ref was not previously packed, it could result in the ref being deleted entirely. > I said above that the gitolite logs say no ref was deleted. What if > the ref "deletion" happened because of this race, making the rest of > your 4-step scenario above possible? It's possible. I do want to highlight how unlikely it is, though. > > up in the middle, or fsck rejects the pack). We have historically left > > fsck... you mean if I had 'receive.fsckObjects' true, right? I don't. > Should I? Would it help this overall situation? As I understand it, > thats only about the internals of each object to check corruption, and > cannot detect a *missing* object on the local object store. Right, I meant if you have receive.fsckObjects on. It won't help this situation at all, as we already do a connectivity check separate from the fsck. But I do recommend it in general, just because it helps catch bad objects before they gets disseminated to a wider audience (at which point it is often infeasible to rewind history). And it has found git bugs (e.g., null sha1s in tree entries). > > At GitHub, we've taken to just cleaning them up aggressively (I think > > after an hour), though I am tempted to put in an optional signal/atexit > > OK; I'll do the same then. I suppose a cron job is the best way; I > didn't find any config for expiring these files. If you run "git prune --expire=1.hour.ago", it should prune stale tmp_pack_* files more than an hour old. But you may not be comfortable with such a short expiration for the objects themselves. :) > Thanks again for your help. I'm going to treat it (for now) as a > disk/fs error after hearing from you about the other possibility I > mentioned above, although I find it hard to believe one repo can be > hit buy *two* races occurring together! Yeah, the race seems pretty unlikely (though it could be just the one race with a rewind). As I said, I haven't actually ever seen it in practice. In my experience, though, disk/fs issues do not manifest as just missing objects, but as corrupted packfiles (e.g., the packfile directory entry ends up pointing to the wrong inode, which is easy to see because the inode's content is actually a reflog). And then of course with the packfile unreadable, you have missing objects. But YMMV, depending on the fs and what's happened to the machine to cause the fs problem. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html