On Tue, Nov 15, 2016 at 09:13:14AM -0500, Matt McCutchen wrote: > I want to change this to something that won't leave an inconsistent > state if interrupted. I've written code for this kind of thing before > that sets GIT_INDEX_FILE and uses a temporary index file and "git > write-tree". But I realized that if "git gc" runs concurrently, the > generated tree could be deleted before it is used and the tool would > fail. If I had a need to run "git commit-tree", it seems like I might > even end up with a commit object with a broken reference to a tree. > "git gc" normally doesn't delete objects that were created in the last > 2 weeks, but if an identical tree was added to the object database more > than 2 weeks ago by another operation and is unreferenced, it could be > reused without updating its mtime and it could still get deleted. Modern versions of git do two things to help with this: - any object which is referenced by a "recent" object (within the 2 weeks) is also considered recent. So if you create a new commit object that points to a tree, even before you reference the commit that tree is protected - when an object write is optimized out because we already have the object, git will update the mtime on the file (loose object or packfile) to freshen it This isn't perfect, though. You can decide to reference an existing object just as it is being deleted. And the pruning process itself is not atomic (and it's tricky to make it so, just because of what we're promised by the filesystem). > Is there a recommended way to avoid this kind of problem in add-on > tools? (I searched the Git documentation and the web for information > about races with "git gc" and didn't find anything useful.) If not, it > seems to be a significant design flaw in "git gc", even if the problem > is extremely rare in practice. I wonder if some of the built-in > commands may have the same problem, though I haven't tried to test > them. If this is confirmed to be a known problem affecting built-in > commands, then at least I won't feel bad about introducing the > same problem into add-on tools. :/ If you have long-running data (like, a temporary index file that might literally sit around for days or weeks) I think that is a potential problem. And the solution is probably to use refs in some way to point to your objects. If you're worried about a short-term operation where somebody happens to run git-gc concurrently, I agree it's a possible problem, but I suspect something you can ignore in practice. For the most part, a lot of the client-side git tools assume that one operation is happening at a time in the repository. And I think that largely holds for a developer working on a single clone, and things just work in practice. Auto-gc makes that a little sketchier, but historically does not seem to have really caused problems in practice. For a busy multi-user server, I recommend turning off auto-gc entirely, and repacking manually with "-k" to be on the safe side. -Peff