On Mon, May 20, 2013 at 6:59 PM, Jeff King <peff@xxxxxxxx> wrote: > On Mon, May 20, 2013 at 09:37:30AM -0700, Junio C Hamano wrote: >> Johan Herland <johan@xxxxxxxxxxx> writes: >> >> > For server-class installations we need ref storage that can be read >> > (and updated?) atomically, and the current system of loose + packed >> > files won't work since reading (and updating) more than a single file >> > is not an atomic operation. Trivially, one could resolve this by >> > dropping loose refs, and always using a single packed-refs file, but >> > that would make it prohibitively expensive to update refs (the entire >> > packed-refs file must be rewritten for every update). >> > >> > Now, observe that we don't have these race conditions in the object >> > database, because it is an add-only immutable data store. >> > >> > What if we stored the refs as a tree object in the object database, >> > referenced by a single (loose) ref? >> >> What is the cost of updating a single branch with that scheme? > > Yeah, I had the same thoughts as you here; I don't think this is really > much better than updating a single packed-refs file. You may only > have to touch log(N) of the entries to update the trees along the path > to your ref (assuming you have a bushy refs/ tree; in practice, of > course, you have a lot of entries in refs/heads/, so you do not quite > get log behavior). So algorithmically it may be slightly better, but you > pay a much higher constant, in that you are creating many tree objects > along the path (and reading them on lookup). True. > But more importantly, it introduces contention between two unrelated > refs that are being updated. Even if we reconcile the differences > automatically (e.g., with a "merge-and-retry" strategy), that is going > to be a serious performance regression for a busy repository, as we > repeatedly try to reconcile the serialized updates to the refs/ root > tree. > > Any transactional solution needs to have the equivalent of ref-level > locking (i.e., row-level locking in a relational setting). It is OK for > two simultaneous updates to different rows to happen in random order; we > don't care about that level of atomicity. The important thing is that > the _readers_ see transactions in consistent order; if we know that > update B started after update A finished, then we must never see update > A without update B reflected. And I think that is more or less a solved > problem in the database world. > > That is what prevents: > > git update-ref A B > git update-ref -d B > > from creating a view where the reader sees neither A nor B (which is > what can happen with the current filesystem view). All true. Just a last spasm from the dying notion that we could solve this in the filesystem, without using "proper" database... ...Johan -- Johan Herland, <johan@xxxxxxxxxxx> www.herland.net -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html