The old code called string_list_lookup(), and if that failed called string_list_insert(), thus doing the bisection search through the string list twice in the latter code path. Instead, just call string_list_insert() right away. If an entry for that peer reference name already existed, then its util pointer is always non-NULL. Of course this doesn't change the fact that the repeated string_list_insert() calls make the function scale like O(N^2) if the input reference list is not already approximately sorted. Signed-off-by: Michael Haggerty <mhagger@xxxxxxxxxxxx> --- The O(N^2) algorithm mentioned above essentially boils down to for each reference: insert reference into string_list (sorted) Because the insertion requires later array entries to be shifted to the right, the algorithm scales like O(N^2) if the entries are not already approximately in order. I can think of three easy ways to fix this: * Use a hash map instead of a sorted string_list to record which entries have already been seen. * If the order of the result doesn't matter and it doesn't matter *which* of the duplicates is retained, then insert all of the names into the string_list unsorted, then sort them all in one go, then iterate through looking for duplicates and stringing the survivors together in a new linked list. * If the order is important, or it is important that the *first* of duplicates be retained, then iterate through the linked list twice. The first time, just add the entries to the string_list. Then sort the string_list using a stable sort (e.g., git_qsort()). Then iterate through the linked list a second time as in the current code. See also my comments to commit 10. However, I believe that the O(N^2) worst case is unlikely to bite in most cases. This function is called from builtin/fetch.c:get_ref_map(), and the input list is the concatenation of several sub-lists. Most sublists come from resolving refspecs, and the last comes from find_non_local_tags(). I believe that each of the sublists is sorted. So the O(N^2) worst-case could only occur if more than one of the sublists is large and at least two of the large sublists are not in the correct order relative to each other. I tried to concoct such a test case by creating a repository with about 20k branches and 20k tags, instrumenting ref_remove_duplicates(), and then fetching with something like git fetch --prune origin refs/heads/*:refs/remotes/origin/* \ refs/tags/*:refs/tags/* Indeed, the running time of ref_remove_duplicates() became significant when the order of the refspecs was reversed, but it was still less than a second out of a much longer command invocation. So I didn't bother fixing the O(N^2) algorithm. If somebody reports real-world slowness here, we can always come back to it. remote.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/remote.c b/remote.c index e9fedfa..a44e897 100644 --- a/remote.c +++ b/remote.c @@ -750,13 +750,15 @@ void ref_remove_duplicates(struct ref *ref_map) struct string_list refs = STRING_LIST_INIT_NODUP; struct string_list_item *item = NULL; struct ref *prev = NULL, *next = NULL; + for (; ref_map; prev = ref_map, ref_map = next) { next = ref_map->next; if (!ref_map->peer_ref) continue; - item = string_list_lookup(&refs, ref_map->peer_ref->name); - if (item) { + item = string_list_insert(&refs, ref_map->peer_ref->name); + if (item->util) { + /* Entry already existed */ if (strcmp(((struct ref *)item->util)->name, ref_map->name)) die("%s tracks both %s and %s", @@ -767,11 +769,9 @@ void ref_remove_duplicates(struct ref *ref_map) free(ref_map->peer_ref); free(ref_map); ref_map = prev; /* skip this; we freed it */ - continue; + } else { + item->util = ref_map; } - - item = string_list_insert(&refs, ref_map->peer_ref->name); - item->util = ref_map; } string_list_clear(&refs, 0); } -- 1.8.4 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html