On Sun, Sep 15, 2019 at 02:18:02PM -0700, Masaya Suzuki wrote: > During git-fetch, the client checks if the advertised tags' OIDs are > already in the fetch request's want OID set. This check is done in a > linear scan. For a repository that has a lot of refs, repeating this > scan takes 15+ minutes. In order to speed this up, create a oid_set for > other refs' OIDs. Good find. I was curious why nobody noticed it sooner, but I think a key element in your chromium example is the use of "--mirror", which brings in a bunch of refs which would not normally be grabbed. There are only about 17k heads and tags, but over 1.5M entries in refs/changes. Quadratically speaking, that's almost 8000 times worse, so your 15-minute delay is only 1/10th of a second in the non-mirror case. Mostly I was wondering if this might have been a recent regression. But it looks rather old: > -static int will_fetch(struct ref **head, const unsigned char *sha1) > +static void create_fetch_oidset(struct ref **head, struct oidset *out) > { > struct ref *rm = *head; > while (rm) { > - if (hasheq(rm->old_oid.hash, sha1)) > - return 1; > + oidset_insert(out, &rm->old_oid); > rm = rm->next; > } > - return 0; > } This function comes from cf7f929a10 (Teach git-fetch to grab a tag at the same time as a commit, 2008-03-02). As explained in that commit message, this technique can't actually get all the tags we're looking for. It was around the same time, in 348e390b17 (Teach fetch-pack/upload-pack about --include-tag, 2008-03-03), that we added an extension to do this reliably. So it _might_ even be possible to get rid of this code entirely, if we don't care about pre-2008 versions of Git (we'd still get the right answer from them; it would just take an extra round-trip). That said, it seems like a good idea to do this obvious and safe conversion in the near term, and we can look at removing it entirely as a separate topic. The cost of the oidset isn't too bad (~30MB, I guess even for that pathological chromium case). > @@ -324,6 +324,7 @@ static void find_non_local_tags(const struct ref *refs, > > refname_hash_init(&existing_refs); > refname_hash_init(&remote_refs); > + create_fetch_oidset(head, &fetch_oids); > > for_each_ref(add_one_refname, &existing_refs); > for (ref = refs; ref; ref = ref->next) { > @@ -340,9 +341,9 @@ static void find_non_local_tags(const struct ref *refs, > if (item && > !has_object_file_with_flags(&ref->old_oid, > OBJECT_INFO_QUICK) && > - !will_fetch(head, ref->old_oid.hash) && > + !oidset_contains(&fetch_oids, &ref->old_oid) && > !has_object_file_with_flags(&item->oid, OBJECT_INFO_QUICK) && > - !will_fetch(head, item->oid.hash)) > + !oidset_contains(&fetch_oids, &item->oid)) > clear_item(item); > item = NULL; > continue; One thing to check is whether the list we feed to will_fetch ever gets updated in the loop that checks it (i.e., do we need to be adding or clearing elements in the oidset). I _think_ the answer is no, because we modify only the "refs" list in that loop (via clear_item()), but the will_fetch() check is on the head/tail we receive (which isn't updated until later). Both of those lists are passed into the function, but I didn't trace it all the way up to make sure they are not aliases. -Peff