On 12/19/2013 01:04 AM, Jeff King wrote: > On Wed, Dec 18, 2013 at 11:44:58PM +0100, Michael Haggerty wrote: > >> [While doing so, I got sidetracked by the question: what happens if a >> prune process deletes the "objects/XX" directory just the same moment >> that another process is trying to write an object into that directory? >> I think the relevant function is sha1_file.c:create_tmpfile(). It looks >> like there is a nonzero but very small race window that could result in >> a spurious "unable to create temporary file" error, but even then I >> don't think there would be any corruption or anything.] > > There's a race there, but I think it's hard to trigger. > > Our strategy with object creation is to call open, recognize ENOENT, > mkdir, and then try again. If the rmdir happens before our call to open, > then we're fine. If open happens first, then the rmdir will fail. > > But we don't loop on ENOENT. So if the rmdir happens in the middle, > after the mkdir but before we call open again, we'd fail, because we > don't treat ENOENT specially in the second call to open. That is > unlikely to happen, though, as prune would not be removing a directory > it did not just enter and clean up an object from (in which case we > would not have gotten the first ENOENT in the creator). [...] The way I read it, prune tries to delete the directory whether or not there were any files in it. So the race could be triggered by a single writer that wants to write an object to a not-yet-existent shard directory and a single prune process that encounters the directory between when it is created and when the object file is added. But that doesn't mean I disagree with your conclusion: > So it seems unlikely and the worst case is a temporary failure, not a > corruption. It's probably not worth caring too much about, but we could > solve it pretty easily by looping on ENOENT on creation. Regarding references: > On a similar note, I imagine that a simultaneous "branch foo/bar" and > "branch -d foo/baz" could race over the creation/deletion of > "refs/heads/foo", but I didn't look into it. Deleting a loose reference doesn't cause the directory containing it to be deleted. The directory is only deleted by pack-refs (and then only when a reference in the directory was just packed) or when there is an attempt to create a new reference that conflicts with the directory. So the question is whether the creation of a loose ref file is robust against the disappearance of a directory that it just created. And the answer is "no". It looks like there are a bunch of places where similar races occur involving references. And probably many others elsewhere in the code. (Any caller of safe_create_leading_directories() is a candidate problem point, and in fact that function itself has an internal race.) I've started fixing some of these but it might take a while. Michael -- Michael Haggerty mhagger@xxxxxxxxxxxx http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html