Re: Git fetch slow on local repository with 600k refs

Sean Allred <allred.sean@xxxxxxxxx> · Tue, 14 Mar 2023 09:29:31 -0500

程洋 <chengyang@xxxxxxxxxx> writes:

> We're holding a Gerrit server cluster. And uses pull-replication
> plugin to sync changes between master and slave.
>
> When a change is pushed to master, it notify the slave, and slave
> fetch it from master.
>
> But we found in a big repository with 600k refs. Fetch takes 5-10
> seconds even if fetching a 1 byte change. Here is the GIT_TRACE2_PERF
>
> I did an experiment to fetch a ref that my slave already have. And we
> can find git rev-list takes 2 seconds to perform. (I guess it try to
> find remote object from reachable objects of local refs one by one)
>
> Is there anyway to optimize such situation?

Do you need all those refs as refs -- or are you just looking to keep
the commits?

We found a rather clever solution for the latter we're looking to
upstream at some point to collect all refs into a single 'archive' ref
that collects commits in fake merge commits (there's no actual conflict
resolution happening -- we just use the same tree over and over). We
make each commit message look like show-ref output. For example:

A single ref (refs/archive) pointing to commit (A), with contents

    tree <some arbitrary tree>
    parent <B> [... 500 other commits 'merged' in ...]
    author <system user>
    committer <system user>

    deadbeef0123456788... refs/tags/very/old/release-1
    deadbeef0123456789... refs/tags/very/old/release-2

When we want to pull a ref out of the archive, we have a process in
place to do so. This keeps the total number of refs down and the
fetch/push performance within acceptable limits.

--
Sean Allred