Re: [PATCH] remote.c: avoid O(n^2) behavior in match_push_refs by using string_list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 02, 2013 at 04:53:48PM -0700, Brandon Casey wrote:

> From: Brandon Casey <drafnel@xxxxxxxxx>
> 
> When pushing, each ref in the local repository must be paired with a
> ref advertised by the remote server.  Currently, this is performed by
> first applying the refspec to the local ref to transform the local ref
> into the name of the remote ref, and then performing a linear search
> through the list of remote refs to see if the remote ref was advertised
> by the remote system.
> 
> This has O(n) complexity and makes match_push_refs() be an O(n^2)
> operation.

Just to be sure I understand correctly, is this actually O(m*n) where
"m" is the number of local refs and "n" is the number of remote refs?

For a repository that repeatedly pushes everything it has to the remote,
we end up with m=n, but it would not necessarily be the case if you are
pushing a subset of your refs. But even pushing a small number of refs
into a repository with a very large number of refs would be
unnecessarily slow, as we would do several O(n) lookups which could be
O(log n). So it may speed things up even in the case of a normal-sized
repo pushing to a large one.

> Dry-run push of a repository with 121913 refs:
> 
>         before     after
> real    1m40.582s  0m0.804s
> user    1m39.914s  0m0.515s
> sys     0m0.125s   0m0.106s

Very nice. :)

> Signed-off-by: Brandon Casey <drafnel@xxxxxxxxx>
> ---
>  remote.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)

Patch itself looks good to me, although...

> @@ -1362,6 +1378,8 @@ int match_push_refs(struct ref *src, struct ref **dst,
>  		free(dst_name);
>  	}
>  
> +	string_list_clear(&ref_list, 0);
> +
>  	if (flags & MATCH_REFS_FOLLOW_TAGS)
>  		add_missing_tags(src, dst, &dst_tail);
>  
> @@ -1376,11 +1394,15 @@ int match_push_refs(struct ref *src, struct ref **dst,
>  
>  			src_name = get_ref_match(rs, nr_refspec, ref, send_mirror, FROM_DST, NULL);
>  			if (src_name) {
> -				if (!find_ref_by_name(src, src_name))
> +				if (!ref_list.nr)
> +					prepare_searchable_ref_list(src,
> +						&ref_list);
> +				if (!string_list_has_string(&ref_list, src_name))

This hunk threw me for a bit, as it looked like we were lazily
initializing ref_list in case we had not done so earlier. But we would
have cleared it mid-way through the function (in the hunk above), and it
is only that we are reusing the same ref_list for two different
purposes.

I do not feel strongly about it, but it might be a little more obvious
to just declare a new variable in the block, like:

diff --git a/remote.c b/remote.c
index 75255af..53bef82 100644
--- a/remote.c
+++ b/remote.c
@@ -1399,6 +1399,7 @@ int match_push_refs(struct ref *src, struct ref **dst,
 		add_missing_tags(src, dst, &dst_tail);
 
 	if (send_prune) {
+		struct string_list src_ref_index = STRING_LIST_INIT_NODUP;
 		/* check for missing refs on the remote */
 		for (ref = *dst; ref; ref = ref->next) {
 			char *src_name;
@@ -1409,15 +1410,15 @@ int match_push_refs(struct ref *src, struct ref **dst,
 
 			src_name = get_ref_match(rs, nr_refspec, ref, send_mirror, FROM_DST, NULL);
 			if (src_name) {
-				if (!ref_list.nr)
+				if (!src_ref_index.nr)
 					prepare_searchable_ref_list(src,
-						&ref_list);
-				if (!string_list_has_string(&ref_list, src_name))
+						&src_ref_index);
+				if (!string_list_has_string(&src_ref_index, src_name))
 					ref->peer_ref = alloc_delete_ref();
 				free(src_name);
 			}
 		}
-		string_list_clear(&ref_list, 0);
+		string_list_clear(&src_ref_index, 0);
 	}
 	if (errs)
 		return -1;

And similarly maybe call the outer ref_list dst_ref_index or something.
I also note that we don't do the lazy-prepare for the other loop. I
guess that is because we assume that "src" is always non-NULL?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]