On Mon, May 20, 2013 at 09:37:30AM -0700, Junio C Hamano wrote: > Johan Herland <johan@xxxxxxxxxxx> writes: > > > For server-class installations we need ref storage that can be read > > (and updated?) atomically, and the current system of loose + packed > > files won't work since reading (and updating) more than a single file > > is not an atomic operation. Trivially, one could resolve this by > > dropping loose refs, and always using a single packed-refs file, but > > that would make it prohibitively expensive to update refs (the entire > > packed-refs file must be rewritten for every update). > > > > Now, observe that we don't have these race conditions in the object > > database, because it is an add-only immutable data store. > > > > What if we stored the refs as a tree object in the object database, > > referenced by a single (loose) ref? > > What is the cost of updating a single branch with that scheme? Yeah, I had the same thoughts as you here; I don't think this is really much better than updating a single packed-refs file. You may only have to touch log(N) of the entries to update the trees along the path to your ref (assuming you have a bushy refs/ tree; in practice, of course, you have a lot of entries in refs/heads/, so you do not quite get log behavior). So algorithmically it may be slightly better, but you pay a much higher constant, in that you are creating many tree objects along the path (and reading them on lookup). But more importantly, it introduces contention between two unrelated refs that are being updated. Even if we reconcile the differences automatically (e.g., with a "merge-and-retry" strategy), that is going to be a serious performance regression for a busy repository, as we repeatedly try to reconcile the serialized updates to the refs/ root tree. Any transactional solution needs to have the equivalent of ref-level locking (i.e., row-level locking in a relational setting). It is OK for two simultaneous updates to different rows to happen in random order; we don't care about that level of atomicity. The important thing is that the _readers_ see transactions in consistent order; if we know that update B started after update A finished, then we must never see update A without update B reflected. And I think that is more or less a solved problem in the database world. That is what prevents: git update-ref A B git update-ref -d B from creating a view where the reader sees neither A nor B (which is what can happen with the current filesystem view). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html