Re: [PATCH 2/5] fetch: avoid lookup of commits when not appending to FETCH_HEAD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 23, 2022 at 09:18:05AM -0500, Derrick Stolee wrote:
> On 2/23/2022 7:35 AM, Patrick Steinhardt wrote:
> > When fetching from a remote repository we will by default write what has
> > been fetched into the special FETCH_HEAD reference. The order in which
> > references are written depends on whether the reference is for merge or
> > not, which, despite some other conditions, is also determined based on
> > whether the old object ID the reference is being updated from actually
> > exists in the repository.
> > 
> > To write FETCH_HEAD we thus loop through all references thrice: once for
> > the references that are about to be merged, once for the references that
> > are not for merge, and finally for all references that are ignored. For
> > every iteration, we then look up the old object ID to determine whether
> > the referenced object exists so that we can label it as "not-for-merge"
> > if it doesn't exist. It goes without saying that this can be expensive
> > in case where we are fetching a lot of references.
> > 
> > While this is hard to avoid in the case where we're writing FETCH_HEAD,
> > users can in fact ask us to skip this work via `--no-write-fetch-head`.
> > In that case, we do not care for the result of those lookups at all
> > because we don't have to order writes to FETCH_HEAD in the first place.
> > 
> > Skip this busywork in case we're not writing to FETCH_HEAD. The
> > following benchmark performs a mirror-fetch in a repository with about
> > two million references:
> > 
> >     Benchmark 1: git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD~)
> >       Time (mean ± σ):     75.388 s ±  1.942 s    [User: 71.103 s, System: 8.953 s]
> >       Range (min … max):   73.184 s … 76.845 s    3 runs
> > 
> >     Benchmark 2: git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD)
> >       Time (mean ± σ):     69.486 s ±  1.016 s    [User: 65.941 s, System: 8.806 s]
> >       Range (min … max):   68.864 s … 70.659 s    3 runs
> > 
> >     Summary
> >       'git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD)' ran
> >         1.08 ± 0.03 times faster than 'git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD~)'
> 
> I have a super-small nitpick here.
> 
> I see that you are using '-n' to name your experiments. These names
> are a bit long, especially since they are the same Git command but
> built at different commits. It would be enough to say the command
> you are testing before the stats and leave the names as "HEAD" and
> "HEAD~" (or, I typically use "new" and "old", respectively).

Fair enough, will change.

Patrick

> >  			/*
> > -			 * References in "refs/tags/" are often going to point
> > -			 * to annotated tags, which are not part of the
> > -			 * commit-graph. We thus only try to look up refs in
> > -			 * the graph which are not in that namespace to not
> > -			 * regress performance in repositories with many
> > -			 * annotated tags.
> > +			 * When writing FETCH_HEAD we need to determine whether
> > +			 * we already have the commit or not. If not, then the
> > +			 * reference is not for merge and needs to be written
> > +			 * to the reflog after other commits which we already
> > +			 * have. We're not interested in this property though
> > +			 * in case FETCH_HEAD is not to be updated, so we can
> > +			 * skip the classification in that case.
> >  			 */
> > -			if (!starts_with(rm->name, "refs/tags/"))
> > -				commit = lookup_commit_in_graph(the_repository, &rm->old_oid);
> > -			if (!commit) {
> > -				commit = lookup_commit_reference_gently(the_repository,
> > -									&rm->old_oid,
> > -									1);
> > -				if (!commit)
> > -					rm->fetch_head_status = FETCH_HEAD_NOT_FOR_MERGE;
> > +			if (fetch_head->fp) {
> > +				struct commit *commit = NULL;
> > +
> > +				/*
> > +				 * References in "refs/tags/" are often going to point
> > +				 * to annotated tags, which are not part of the
> > +				 * commit-graph. We thus only try to look up refs in
> > +				 * the graph which are not in that namespace to not
> > +				 * regress performance in repositories with many
> > +				 * annotated tags.
> > +				 */
> > +				if (!starts_with(rm->name, "refs/tags/"))
> > +					commit = lookup_commit_in_graph(the_repository, &rm->old_oid);
> > +				if (!commit) {
> > +					commit = lookup_commit_reference_gently(the_repository,
> > +										&rm->old_oid,
> > +										1);
> > +					if (!commit)
> > +						rm->fetch_head_status = FETCH_HEAD_NOT_FOR_MERGE;
> > +				}
> >  			}
> Looks good. Most of the diff is whitespace.
> 
> Thanks,
> -Stolee

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux