Re: Pathological performance with git remote rename and many tracking refs

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Sat, 16 Apr 2022 13:23:28 +0200

On Fri, Apr 15 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:
>
>> I actually wonder if this wouldn't be that hard, because if we have any
>> reflogs we could simply "replay" them by turning (again, I'm a bit rusty
>> on the exact lock dance);
>>
>>     lock(refs/remotes/origin/master);
>>     lock(refs/remotes/def/master);
>>     create(refs/remotes/def/master, refs/remotes/origin/master^{});
>
> Why deref?  Trying to be prepared for seeing a symbolic ref, or
> something?

I just meant "$(git rev-parse $NAME)", sorry about being unclear. It
would work for branches, but yeah, for annotated tag objects it would be
the wrong thing.

>>     delete(refs/remotes/origin/master);
>>     unlock(refs/remotes/origin/master);
>>     unlock(refs/remotes/def/master);
>>
>> Into instead doing:
>>
>>     lock(refs/remotes/origin/master);
>>     lock(refs/remotes/def/master);
>>     for from, to, msg ref_update(refs/remotes/origin/master):
>>         update(refs/remotes/def/master, from, to, msg);
>>     delete(refs/remotes/origin/master);
>>     unlock(refs/remotes/origin/master);
>>     unlock(refs/remotes/def/master);
>
> Would this (or the above, for that matter) work well when renaming 'foo'
> to 'foo/bar' (or the other way around), I wonder?

Hrm, probably not. Also for that we'll have tricky collisions with the
2nd level (master) v.s. 1st (origin/) won't we?

> Is the reasoning behind "replay each and every reflog entry" because
> you are trying to avoid depending on the implementation detail (i.e.
> "R=.git/logs/refs/remotes; mv $R/origin/master $R/def/master" works
> only with the files backend)?

It's a suggested workaround around the ref backend not understanding how
to "plug in" someting like a "mv", i.e. we can represent this sort of
thing as a sequence of normal ref operations.

> Unless we are willing to add a new
> ref backend method to help this natively, it would be a workable but
> an ultra-slow way to do so, as it would involve open, write, fsync,
> and close for each reflog entry.
>
>     ... goes and looks ...

It's far from optimal, but I think the main slowdown is due to the
(re)writing of the packed refs file, whereas the churn of doing a bunch
of locks, append-only writes etc. should be small by comparison, I
haven't benchmarked it though...

Also for my own reflogs (especially of remotes) some have very busy
logs, and some just 1-2 entries...

> Hmph, I am utterly confused.  refs/files-backend.c eventually
> dispatches to files_copy_or_rename_ref() when rename_ref method in
> ref_storage_be is used on the files backend, and the function
> already renames the reflog file without having to copy entry by
> entry.  We cannot just open a transaction, run many rename_refs and
> close it?

The whole problem (such as it is) is that that isn't part of the
transaction mechanism.

I.e. all of the logic to make this failure tolerant is missing, you're
renaming N refs and their logs, and we fail in the middle of it, does it
roll back to a consistent state? (it doesn't).

Of course this code could be made to handle such failures, address
consistency issues etc., but doing so would essentially be a
re-implementation of transactions.

So I'm wondering (perhaps unproductively) if having some way to
piggy-back the mechanism into the transaction mechanism in an easy way
would be a good way to do it...