Re: [PATCH] connected: always use partial clone optimization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2020.03.20 15:00, Jonathan Tan wrote:
[snip]
> The addition of the fast path might cause performance reductions in
> these cases:
> 
>  - If a partial clone or a fetch into a partial clone fails, Git will
>    fruitlessly run rev-list (it is expected that everything fetched
>    would go into promisor packs, so if that didn't happen, it is most
>    likely that rev-list will fail too).
> 
>  - Any connectivity checks done by receive-pack, in the (in my opinion,
>    unlikely) event that a partial clone serves receive-pack.

Yes, this setup doesn't match with my understanding of the usual partial
clone workflow.

> I think that these cases are rare enough, and the performance reduction
> in this case minor enough (additional object DB access), that the
> benefit of avoiding a flag outweighs these.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx>
> ---
> This is the second half of the work I did previously [1]. Quoting from
> [1]:
> 
> > For example, a local fetch was sped up from 6.63s to 3.39s. The bulk of
> > the remaining time is spent in yet another connectivity check
> > (fetch_refs -> check_exist_and_connected) prior to the fetch - that will
> > hopefully be done in a subsequent patch.
> 
> This is the subsequent patch. (Note that the timings were done on
> another computer, so don't compare the timings from [1] and this patch
> directly.)
> 
> [1] https://lore.kernel.org/git/be1d6aa4c4fd8868f3682b73c01a92d3830534ad.1578802317.git.jonathantanmy@xxxxxxxxxx/
> ---
>  builtin/clone.c | 7 ++-----
>  builtin/fetch.c | 7 -------
>  connected.c     | 9 +++++++--
>  connected.h     | 9 ---------
>  4 files changed, 9 insertions(+), 23 deletions(-)
> 
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 1ad26f4d8c..4b2b14ff61 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -672,8 +672,7 @@ static void update_remote_refs(const struct ref *refs,
>  			       const char *branch_top,
>  			       const char *msg,
>  			       struct transport *transport,
> -			       int check_connectivity,
> -			       int check_refs_are_promisor_objects_only)
> +			       int check_connectivity)
>  {
>  	const struct ref *rm = mapped_refs;
>  
> @@ -682,8 +681,6 @@ static void update_remote_refs(const struct ref *refs,
>  
>  		opt.transport = transport;
>  		opt.progress = transport->progress;
> -		opt.check_refs_are_promisor_objects_only =
> -			!!check_refs_are_promisor_objects_only;

I was curious if any other code uses this option; it appears not. And
you're removing the option from the struct later on, so that's good.

>  
>  		if (check_connected(iterate_ref_map, &rm, &opt))
>  			die(_("remote did not send all necessary objects"));
> @@ -1275,7 +1272,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  
>  	update_remote_refs(refs, mapped_refs, remote_head_points_at,
>  			   branch_top.buf, reflog_msg.buf, transport,
> -			   !is_local, filter_options.choice);
> +			   !is_local);
>  
>  	update_head(our_head_points_at, remote_head, reflog_msg.buf);
>  
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index bf6bab80fa..1097e1e512 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -908,13 +908,6 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
>  	if (!connectivity_checked) {
>  		struct check_connected_options opt = CHECK_CONNECTED_INIT;
>  
> -		if (filter_options.choice)
> -			/*
> -			 * Since a filter is specified, objects indirectly
> -			 * referenced by refs are allowed to be absent.
> -			 */
> -			opt.check_refs_are_promisor_objects_only = 1;
> -
>  		rm = ref_map;
>  		if (check_connected(iterate_ref_map, &rm, &opt)) {
>  			rc = error(_("%s did not send all necessary objects\n"), url);
> diff --git a/connected.c b/connected.c
> index 7e9bd1bc62..846f2e4eef 100644
> --- a/connected.c
> +++ b/connected.c
> @@ -52,7 +52,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
>  		strbuf_release(&idx_file);
>  	}
>  
> -	if (opt->check_refs_are_promisor_objects_only) {
> +	if (has_promisor_remote()) {
>  		/*
>  		 * For partial clones, we don't want to have to do a regular
>  		 * connectivity check because we have to enumerate and exclude
> @@ -71,13 +71,18 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
>  				if (find_pack_entry_one(oid.hash, p))
>  					goto promisor_pack_found;
>  			}
> -			return 1;
> +			/*
> +			 * Fallback to rev-list with oid and the rest of the
> +			 * object IDs provided by fn.
> +			 */
> +			goto no_promisor_pack_found;

Previously, we'd iterate until we fail to find an OID in a promisor pack
in which case we'd return 1; now, we instead jump to the
non-promisor-pack check, which uses rev-list.

>  promisor_pack_found:
>  			;
>  		} while (!fn(cb_data, &oid));
>  		return 0;
>  	}
>  
> +no_promisor_pack_found:
>  	if (opt->shallow_file) {
>  		argv_array_push(&rev_list.args, "--shallow-file");
>  		argv_array_push(&rev_list.args, opt->shallow_file);
> diff --git a/connected.h b/connected.h
> index eba5c261ba..8d5a6b3ad6 100644
> --- a/connected.h
> +++ b/connected.h
> @@ -46,15 +46,6 @@ struct check_connected_options {
>  	 * during a fetch.
>  	 */
>  	unsigned is_deepening_fetch : 1;
> -
> -	/*
> -	 * If non-zero, only check that the top-level objects referenced by the
> -	 * wanted refs (passed in as cb_data) are promisor objects. This is
> -	 * useful for partial clones, where enumerating and excluding all
> -	 * promisor objects is very slow and the commit-walk itself becomes a
> -	 * no-op.
> -	 */
> -	unsigned check_refs_are_promisor_objects_only : 1;
>  };

And here's the struct field cleanup.

>  
>  #define CHECK_CONNECTED_INIT { 0 }
> -- 
> 2.25.1.696.g5e7596f4ac-goog
> 

This all looks good to me.

Reviewed-by: Josh Steadmon <steadmon@xxxxxxxxxx>



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux