Hi Jacob, On Wed, Jan 20, 2021 at 01:45:14PM +0100, Jacob Vosmaer wrote: > In git-pack-objects, we iterate over all the tags if the --include-tag > option is passed on the command line. For some reason this uses > for_each_ref which is expensive if the repo has many refs. We should > use for_each_tag_ref instead. I don't think it's worth sending another version, but I would have liked to see: "... because we can save time by only iterating over some of the refs" at the end of this paragraph. > Because the add_ref_tag callback will now only visit tags we > simplified it a bit. > > The motivation for this change is that we observed performance issues > with a repository on gitlab.com that has 500,000 refs but only 2,000 > tags. The fetch traffic on that repo is dominated by CI, and when we > changed CI to fetch with 'git fetch --no-tags' we saw a dramatic > change in the CPU profile of git-pack-objects. This lead us to this > particular ref walk. More details in: > https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/746#note_483546598 > > Signed-off-by: Jacob Vosmaer <jacob@xxxxxxxxxx> > --- > builtin/pack-objects.c | 8 +++----- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index 2a00358f34..ad52c91bdb 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -2803,13 +2803,11 @@ static void add_tag_chain(const struct object_id *oid) > } > } > > -static int add_ref_tag(const char *path, const struct object_id *oid, int flag, void *cb_data) > +static int add_ref_tag(const char *tag, const struct object_id *oid, int flag, void *cb_data) > { > struct object_id peeled; > > - if (starts_with(path, "refs/tags/") && /* is a tag? */ > - !peel_ref(path, &peeled) && /* peelable? */ > - obj_is_packed(&peeled)) /* object packed? */ > + if (!peel_ref(tag, &peeled) && obj_is_packed(&peeled)) > add_tag_chain(oid); > return 0; > } > @@ -3740,7 +3738,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > } > cleanup_preferred_base(); > if (include_tag && nr_result) > - for_each_ref(add_ref_tag, NULL); > + for_each_tag_ref(add_ref_tag, NULL); OK. Seeing another caller (builtin/pack-objects.c:compute_write_order()) that passes a callback to for_each_tag_ref() makes me feel more comfortable about using it here. Thanks for investigating and resolving this in a way which cleans up the surrounding code. > stop_progress(&progress_state); > trace2_region_leave("pack-objects", "enumerate-objects", > the_repository); > -- > 2.30.0 This version looks good to me, thanks for digging! Reviewed-by: Taylor Blau <me@xxxxxxxxxxxx> Thanks, Taylor