Re: Pathological performance with git remote rename and many tracking refs

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Fri, 15 Apr 2022 14:26:16 +0200

On Fri, Apr 15 2022, brian m. carlson wrote:

> [[PGP Signed Part:Undecided]]
> On 2022-04-14 at 07:12:20, Ævar Arnfjörð Bjarmason wrote:
>> Aside from how we'd do renames with transactions, do you know about
>> clone.defaultRemoteName and --origin?
>
> Yes, I do know about that.  However, in my case, the repository is
> cloned before I even get a chance to touch it, so these options have no
> effect.  My dotfiles aren't even on the machine at that point.

*nod*

>> There was a (small) thread as a follow-up to that "rename --progress"
>> patch at the time, did you spot/read that?:
>> https://lore.kernel.org/git/220302.865yow6u8a.gmgdl@xxxxxxxxxxxxxxxxxxx/
>
> Yeah, I remember reading it at the time.
>
>> More generally, probably:
>> 
>>  1. Teach transactions about N operations on the same refname, which
>>     they'll currently die on, renames are one case.
>> 
>>  2. Be able to "hook in" to them, updating reflogs is one special-case,
>>     but we have the same inherent issue with updating config in lockstep
>>     with transactions.
>
> Yeah, that's one of the reasons I was thinking of implementing
> --no-reflogs: because in that case, the operation really is a
> create/delete operation and it doesn't require any additional logic in
> the transaction.
>
> My goal here is specifically not to rearchitect ref transactions and to
> implement a relatively simple solution.  Your response is indicating to
> me that updating reflogs is going to be the former and not the latter.

Yes, I think it would be nicer to solve the general problem, because
we'd otherwise end up with no-reflogs and have-reflogs paths through
these functions.

I'd also think you'd be stuck trying to instrument transactions in some
new way anyway, because wouldn't this thing be racy if reflogs for one
of the remotes appeared after you started you'd need to abort or deal
with them?

Or was the plan to:

 1. Create lock files for them all
 2. No reflogs?
 3. Rename..

Maybe that was what we did in any case, I'm rather rusty on the exact
procedure at the moment. I.e. did we do the mass-lock first?

Anyway.

I actually wonder if this wouldn't be that hard, because if we have any
reflogs we could simply "replay" them by turning (again, I'm a bit rusty
on the exact lock dance);

    lock(refs/remotes/origin/master);
    lock(refs/remotes/def/master);
    create(refs/remotes/def/master, refs/remotes/origin/master^{});
    delete(refs/remotes/origin/master);
    unlock(refs/remotes/origin/master);
    unlock(refs/remotes/def/master);

Into instead doing:

    lock(refs/remotes/origin/master);
    lock(refs/remotes/def/master);
    for from, to, msg ref_update(refs/remotes/origin/master):
    update(refs/remotes/def/master, from, to, msg);
    delete(refs/remotes/origin/master);
    unlock(refs/remotes/origin/master);
    unlock(refs/remotes/def/master);

I.e. simply replay each individual item in the reflog, basically using
the code in your builtin/stash.c series.

I didn't think of that before, and just thought of some
extra-transactional way to copy the reflog over as-is.

I think that multi-copy approach would run into
ref_update_reject_duplicates(), i.e. you can't have multiple updates
now, but that's a sanity assertion to make sure the transaction is
internally consistent.

In this case the N update() calls are semnatically the same as a
create() call in terms of aborting the transaction, i.e. we'd "fold"
them away, i.e. turn the sanity check into pretending N such updates
were the same as a single create(), so we'd delete on rollback.

Maybe that would work? Hrm...

Anyway, depending on the answer to if your --no-reflogs plan would have
any lock issues etc, perhaps an alternate plan would be:

    git remote rename --unsafe

Where we'd just "mv" the 1-2 relevant refs/remotes/X and
log/refs/remotes/X folders, followed by updating refs...