On Thu, Dec 19, 2013 at 05:33:55PM +0100, Michael Haggerty wrote: > > But we don't loop on ENOENT. So if the rmdir happens in the middle, > > after the mkdir but before we call open again, we'd fail, because we > > don't treat ENOENT specially in the second call to open. That is > > unlikely to happen, though, as prune would not be removing a directory > > it did not just enter and clean up an object from (in which case we > > would not have gotten the first ENOENT in the creator). [...] > > The way I read it, prune tries to delete the directory whether or not > there were any files in it. So the race could be triggered by a single > writer that wants to write an object to a not-yet-existent shard > directory and a single prune process that encounters the directory > between when it is created and when the object file is added. Yes, that's true. It does make the race slightly more difficult than a straight deletion because the prune has to catch it in the moment where it exists but does not yet have an object. But it's still possible. > But that doesn't mean I disagree with your conclusion: I think we're in violent agreement at this point. :) > Regarding references: > > > On a similar note, I imagine that a simultaneous "branch foo/bar" and > > "branch -d foo/baz" could race over the creation/deletion of > > "refs/heads/foo", but I didn't look into it. > > Deleting a loose reference doesn't cause the directory containing it to > be deleted. The directory is only deleted by pack-refs (and then only > when a reference in the directory was just packed) or when there is an > attempt to create a new reference that conflicts with the directory. So > the question is whether the creation of a loose ref file is robust > against the disappearance of a directory that it just created. Ah, right, I forgot we leave the directories sitting around after deletion. So we may run into a collision with another creator, but by definition we would have a D/F conflict with such a creator anyway, so we cannot both succeed. But we can hit the problem with pack-refs, as you note: > And the answer is "no". It looks like there are a bunch of places where > similar races occur involving references. And probably many others > elsewhere in the code. (Any caller of safe_create_leading_directories() > is a candidate problem point, and in fact that function itself has an > internal race.) I've started fixing some of these but it might take a > while. Yeah, I think you'd have to teach safe_create_leading_directories to atomically try-to-create-and-check-errno rather than stat+mkdir. And then teach it to backtrack when an expected leading path goes missing after we created it (so mkdir("foo"), then mkdir("foo/bar"), then step back to mkdir("foo") if we got ENOENT). I don't think the races are a big deal, though. As with the prune case, we will ultimately fail to create the lockfile and get a temporary failure rather than a corruption. So unless we actually have reports of it happening (and I have seen none), it's probably not worth spending much time on. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html