Re: [PATCH v2 1/1] builtin/pack-objects.c: avoid iterating all refs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jacob,

On Wed, Jan 20, 2021 at 01:45:14PM +0100, Jacob Vosmaer wrote:
> In git-pack-objects, we iterate over all the tags if the --include-tag
> option is passed on the command line. For some reason this uses
> for_each_ref which is expensive if the repo has many refs. We should
> use for_each_tag_ref instead.

I don't think it's worth sending another version, but I would have liked
to see: "... because we can save time by only iterating over some of the
refs" at the end of this paragraph.

> Because the add_ref_tag callback will now only visit tags we
> simplified it a bit.
>
> The motivation for this change is that we observed performance issues
> with a repository on gitlab.com that has 500,000 refs but only 2,000
> tags. The fetch traffic on that repo is dominated by CI, and when we
> changed CI to fetch with 'git fetch --no-tags' we saw a dramatic
> change in the CPU profile of git-pack-objects. This lead us to this
> particular ref walk. More details in:
> https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/746#note_483546598
>
> Signed-off-by: Jacob Vosmaer <jacob@xxxxxxxxxx>
> ---
>  builtin/pack-objects.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 2a00358f34..ad52c91bdb 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -2803,13 +2803,11 @@ static void add_tag_chain(const struct object_id *oid)
>  	}
>  }
>
> -static int add_ref_tag(const char *path, const struct object_id *oid, int flag, void *cb_data)
> +static int add_ref_tag(const char *tag, const struct object_id *oid, int flag, void *cb_data)
>  {
>  	struct object_id peeled;
>
> -	if (starts_with(path, "refs/tags/") && /* is a tag? */
> -	    !peel_ref(path, &peeled)    && /* peelable? */
> -	    obj_is_packed(&peeled)) /* object packed? */
> +	if (!peel_ref(tag, &peeled) && obj_is_packed(&peeled))
>  		add_tag_chain(oid);
>  	return 0;
>  }
> @@ -3740,7 +3738,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  	}
>  	cleanup_preferred_base();
>  	if (include_tag && nr_result)
> -		for_each_ref(add_ref_tag, NULL);
> +		for_each_tag_ref(add_ref_tag, NULL);

OK. Seeing another caller (builtin/pack-objects.c:compute_write_order())
that passes a callback to for_each_tag_ref() makes me feel more
comfortable about using it here.

Thanks for investigating and resolving this in a way which cleans up the
surrounding code.

>  	stop_progress(&progress_state);
>  	trace2_region_leave("pack-objects", "enumerate-objects",
>  			    the_repository);
> --
> 2.30.0

This version looks good to me, thanks for digging!

  Reviewed-by: Taylor Blau <me@xxxxxxxxxxxx>

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux