Re: [PATCH 5/5] load_ref_decorations(): avoid parsing non-tag objects

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Tue, 22 Jun 2021 19:06:43 +0200

On Tue, Jun 22 2021, Jeff King wrote:

> When we load the ref decorations, we parse the object pointed to by each
> ref in order to get a "struct object". This is unnecessarily expensive;
> we really only need the object struct, and don't even look at the parsed
> contents. The exception is tags, which we do need to peel.
>
> We can improve this by looking up the object type first (which is much
> cheaper), and skipping the parse entirely for non-tags. This increases
> the work slightly for annotated tags (which now do a type lookup _and_ a
> parse), but decreases it a lot for other types. On balance, this seems
> to be a good tradeoff.
>
> In my git.git clone, with ~2k refs, most of which are branches, the time
> to run "git log -1 --decorate" drops from 34ms to 11ms. Even on my
> linux.git clone, which contains mostly tags and only a handful of
> branches, the time drops from 30ms to 19ms. And on a more extreme
> real-world case with ~220k refs, mostly non-tags, the time drops from
> 2.6s to 650ms.
>
> That command is a lop-sided example, of course, because it does as
> little non-loading work as possible. But it does show the absolute time
> improvement. Even in something like a full "git log --decorate" on that
> extreme repo, we'd still be saving 2s of CPU time.
>
> Ideally we could push this even further, and avoid parsing even tags, by
> relying on the packed-refs "peel" optimization (which we could do by
> calling peel_iterated_oid() instead of peeling manually). But we can't
> do that here. The packed-refs file only stores the bottom-layer of the
> peel (so in a "tag->tag->commit" chain, it stores only the commit as the
> peel result).  But the decoration code wants to peel the layers
> individually, annotating the middle layers of the chain.
>
> If the packed-refs file ever learns to store all of the peeled layers,
> then we could switch to it. Or even if it stored a flag to indicate the
> peel was not multi-layer (because most of them aren't), then we could
> use it most of the time and fall back to a manual peel for the rare
> cases.
>
> Signed-off-by: Jeff King <peff@xxxxxxxx>
> ---
>  log-tree.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/log-tree.c b/log-tree.c
> index 7b823786c2..8b700e9c14 100644
> --- a/log-tree.c
> +++ b/log-tree.c
> @@ -134,6 +134,7 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid,
>  			      int flags, void *cb_data)
>  {
>  	struct object *obj;
> +	enum object_type objtype;
>  	enum decoration_type type = DECORATION_NONE;
>  	struct decoration_filter *filter = (struct decoration_filter *)cb_data;
>  
> @@ -155,9 +156,10 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid,
>  		return 0;
>  	}
>  
> -	obj = parse_object(the_repository, oid);
> -	if (!obj)
> +	objtype = oid_object_info(the_repository, oid, NULL);
> +	if (type < 0)
>  		return 0;
> +	obj = lookup_object_by_type(the_repository, oid, objtype);

This series looks good. I just wonder if between this and my own
lookup_{blob,tree,tag,commit}_type() in [1] whether exposing some
function between what we have now in parse_object() and
parse_object_buffer() wouldn't also do this for you.

I.e. in my patch if you pass a type into parse_object_buffer() I think
you'll get the same behavior.

To be clear I see nothing wrong with this, it's more of a musing about
how some functions in object.c discover the type on their own, others
allow passing it in, sometimes (worse before that series of mine) we
relay the not-real-but-inferred-type etc.

1. https://lore.kernel.org/git/patch-10.11-a84f670ac24-20210328T021238Z-avarab@xxxxxxxxx/