Re: [PATCH 00/17] Remove assumptions about refname lifetimes

Johan Herland <johan@xxxxxxxxxxx> · Mon, 20 May 2013 19:08:27 +0200

On Mon, May 20, 2013 at 6:59 PM, Jeff King <peff@xxxxxxxx> wrote:
> On Mon, May 20, 2013 at 09:37:30AM -0700, Junio C Hamano wrote:
>> Johan Herland <johan@xxxxxxxxxxx> writes:
>>
>> > For server-class installations we need ref storage that can be read
>> > (and updated?) atomically, and the current system of loose + packed
>> > files won't work since reading (and updating) more than a single file
>> > is not an atomic operation. Trivially, one could resolve this by
>> > dropping loose refs, and always using a single packed-refs file, but
>> > that would make it prohibitively expensive to update refs (the entire
>> > packed-refs file must be rewritten for every update).
>> >
>> > Now, observe that we don't have these race conditions in the object
>> > database, because it is an add-only immutable data store.
>> >
>> > What if we stored the refs as a tree object in the object database,
>> > referenced by a single (loose) ref?
>>
>> What is the cost of updating a single branch with that scheme?
>
> Yeah, I had the same thoughts as you here; I don't think this is really
> much better than updating a single packed-refs file. You may only
> have to touch log(N) of the entries to update the trees along the path
> to your ref (assuming you have a bushy refs/ tree; in practice, of
> course, you have a lot of entries in refs/heads/, so you do not quite
> get log behavior). So algorithmically it may be slightly better, but you
> pay a much higher constant, in that you are creating many tree objects
> along the path (and reading them on lookup).

True.

> But more importantly, it introduces contention between two unrelated
> refs that are being updated. Even if we reconcile the differences
> automatically (e.g., with a "merge-and-retry" strategy), that is going
> to be a serious performance regression for a busy repository, as we
> repeatedly try to reconcile the serialized updates to the refs/ root
> tree.
>
> Any transactional solution needs to have the equivalent of ref-level
> locking (i.e., row-level locking in a relational setting). It is OK for
> two simultaneous updates to different rows to happen in random order; we
> don't care about that level of atomicity. The important thing is that
> the _readers_ see transactions in consistent order; if we know that
> update B started after update A finished, then we must never see update
> A without update B reflected. And I think that is more or less a solved
> problem in the database world.
>
> That is what prevents:
>
>   git update-ref A B
>   git update-ref -d B
>
> from creating a view where the reader sees neither A nor B (which is
> what can happen with the current filesystem view).

All true. Just a last spasm from the dying notion that we could solve
this in the filesystem, without using "proper" database...

...Johan

-- 
Johan Herland, <johan@xxxxxxxxxxx>
www.herland.net
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html