Re: [PATCH v2] fetch: limit shared symref check only for local branches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 16, 2022 at 7:01 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> "Orgad Shaneh via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
>
> > From: Orgad Shaneh <orgads@xxxxxxxxx>
> >
> > This check was introduced in 8ee5d73137f (Fix fetch/pull when run without
> > --update-head-ok, 2008-10-13) in order to protect against replacing the ref
> > of the active branch by mistake, for example by running git fetch origin
> > master:master.
> >
> > It was later extended in 8bc1f39f411 (fetch: protect branches checked out
> > in all worktrees, 2021-12-01) to scan all worktrees.
> >
> > This operation is very expensive (takes about 30s in my repository) when
> > there are many tags or branches, and it is executed on every fetch, even if
> > no local heads are updated at all.
> >
> > Limit it to protect only refs/heads/* to improve fetch performance.
>
> The point of the check is to prevent the index+working tree in the
> worktrees to go out of sync with HEAD, and HEAD by definition can
> point only into refs/heads/*, this change should be OK.
>
> It is surprising find_shared_symref() is so expensive, though.  If
> you have a dozen worktrees linked to the current repository, there
> are at most a dozen HEAD that point at various refs in refs/heads/
> namespace.  Even if you need to check a thousand ref_map elements,
> it should cost almost nothing if you build a hashmap to find matches
> with these dozen HEADs upfront, no?

I also had this idea, but I'm not familiar enough with the codebase to
implement it. I see you already started that.

> Another thing that is surprising is that you say this loop is
> expensive when there are many tags or branches.  Do you mean it is
> expensive when there are many tags and branches that are updated, or
> it is expensive to merely have thousands of dormant tags and
> branches?  If the latter, I wonder if it is sensible to limit the
> check only to the refs that are going to be updated.

It's expensive even when *nothing* is updated. I have a repo with 44K
tags, 13K of the tags are annotated, 134 remote branches and 4
worktrees (except the main repo) with 33 local branches.

I counted the calls to find_shared_symref - it was called 35755 times,
and refs_read_raw_ref was called 357585 times.

- Orgad



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux