程洋 <chengyang@xxxxxxxxxx> writes: > We're holding a Gerrit server cluster. And uses pull-replication > plugin to sync changes between master and slave. > > When a change is pushed to master, it notify the slave, and slave > fetch it from master. > > But we found in a big repository with 600k refs. Fetch takes 5-10 > seconds even if fetching a 1 byte change. Here is the GIT_TRACE2_PERF > > I did an experiment to fetch a ref that my slave already have. And we > can find git rev-list takes 2 seconds to perform. (I guess it try to > find remote object from reachable objects of local refs one by one) > > Is there anyway to optimize such situation? Do you need all those refs as refs -- or are you just looking to keep the commits? We found a rather clever solution for the latter we're looking to upstream at some point to collect all refs into a single 'archive' ref that collects commits in fake merge commits (there's no actual conflict resolution happening -- we just use the same tree over and over). We make each commit message look like show-ref output. For example: A single ref (refs/archive) pointing to commit (A), with contents tree <some arbitrary tree> parent <B> [... 500 other commits 'merged' in ...] author <system user> committer <system user> deadbeef0123456788... refs/tags/very/old/release-1 deadbeef0123456789... refs/tags/very/old/release-2 When we want to pull a ref out of the archive, we have a process in place to do so. This keeps the total number of refs down and the fetch/push performance within acceptable limits. -- Sean Allred