On Tue, Dec 4, 2018 at 2:42 PM Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote: > > When fetching into a repository, a connectivity check is first made by > check_exist_and_connected() in builtin/fetch.c that runs: > > git rev-list --objects --stdin --not --all --quiet <(list of objects) > > If the client repository has many refs, this command can be slow, > regardless of the nature of the server repository or what is being > fetched. A profiler reveals that most of the time is spent in > setup_revisions() (approx. 60/63), and of the time spent in > setup_revisions(), most of it is spent in parse_object() (approx. > 49/60). This is because setup_revisions() parses the target of every ref > (from "--all"), and parse_object() reads the buffer of the object. > > Reading the buffer is unnecessary if the repository has a commit graph > and if the ref points to a commit (which is typically the case). This > patch uses the commit graph wherever possible; on my computer, when I > run the above command with a list of 1 object on a many-ref repository, > I get a speedup from 1.8s to 1.0s. > > Another way to accomplish this effect would be to modify parse_object() > to use the commit graph if possible; however, I did not want to change > parse_object()'s current behavior of always checking the object > signature of the returned object. > > Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx> > --- > This is on sb/more-repo-in-api because I'm using the repo_parse_commit() > function. This is a mere nicety, not strictly required. Before we had parse_commit(struct commit *) which would accomplish the same, (and we'd still have that afterwards as a #define falling back onto the_repository). As the function get_reference() is not the_repository safe as it contains a call to is_promisor_object() that is repository agnostic, I think it would be fair game to not depend on that series. I am not complaining, though. > A colleague noticed this issue when handling a mirror clone. > > Looking at the bigger picture, the speed of the connectivity check > during a fetch might be further improved by passing only the negotiation > tips (obtained through --negotiation-tip) instead of "--all". This patch > just handles the low-hanging fruit first. > --- > revision.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/revision.c b/revision.c > index b5108b75ab..e7da2c57ab 100644 > --- a/revision.c > +++ b/revision.c > @@ -212,7 +212,20 @@ static struct object *get_reference(struct rev_info *revs, const char *name, > { > struct object *object; > > - object = parse_object(revs->repo, oid); > + /* > + * If the repository has commit graphs, repo_parse_commit() avoids > + * reading the object buffer, so use it whenever possible. > + */ > + if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT) { > + struct commit *c = lookup_commit(revs->repo, oid); > + if (!repo_parse_commit(revs->repo, c)) > + object = (struct object *) c; > + else > + object = NULL; Would it make sense in this case to rely on parse_object below instead of assigning NULL? The reason for that would be that when lookup_commit returns NULL, we would try more broadly. AFAICT oid_object_info doesn't take advantage of the commit graph, but just looks up the object header, which is still less than completely parsing it. Then lookup_commit is overly strict, as it may return NULL as when there still is a type mismatch (I don't think a mismatch could happen here, as both rely on just the object store, and not the commit graph.), so this would be just defensive programming for the sake of it. I dunno. struct commit *c; if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT && (c = lookup_commit(revs->repo, oid)) && !repo_parse_commit(revs->repo, c)) object = (struct object *) c; else object = parse_object(revs->repo, oid); So with all that said, I still think this is a good patch. Thanks, Stefan > + } else { > + object = parse_object(revs->repo, oid); > + } > + > if (!object) { > if (revs->ignore_missing) > return object; > -- > 2.19.0.271.gfe8321ec05.dirty >