On Tue, Jun 22 2021, Jeff King wrote: > When we load the ref decorations, we parse the object pointed to by each > ref in order to get a "struct object". This is unnecessarily expensive; > we really only need the object struct, and don't even look at the parsed > contents. The exception is tags, which we do need to peel. > > We can improve this by looking up the object type first (which is much > cheaper), and skipping the parse entirely for non-tags. This increases > the work slightly for annotated tags (which now do a type lookup _and_ a > parse), but decreases it a lot for other types. On balance, this seems > to be a good tradeoff. > > In my git.git clone, with ~2k refs, most of which are branches, the time > to run "git log -1 --decorate" drops from 34ms to 11ms. Even on my > linux.git clone, which contains mostly tags and only a handful of > branches, the time drops from 30ms to 19ms. And on a more extreme > real-world case with ~220k refs, mostly non-tags, the time drops from > 2.6s to 650ms. > > That command is a lop-sided example, of course, because it does as > little non-loading work as possible. But it does show the absolute time > improvement. Even in something like a full "git log --decorate" on that > extreme repo, we'd still be saving 2s of CPU time. > > Ideally we could push this even further, and avoid parsing even tags, by > relying on the packed-refs "peel" optimization (which we could do by > calling peel_iterated_oid() instead of peeling manually). But we can't > do that here. The packed-refs file only stores the bottom-layer of the > peel (so in a "tag->tag->commit" chain, it stores only the commit as the > peel result). But the decoration code wants to peel the layers > individually, annotating the middle layers of the chain. > > If the packed-refs file ever learns to store all of the peeled layers, > then we could switch to it. Or even if it stored a flag to indicate the > peel was not multi-layer (because most of them aren't), then we could > use it most of the time and fall back to a manual peel for the rare > cases. > > Signed-off-by: Jeff King <peff@xxxxxxxx> > --- > log-tree.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/log-tree.c b/log-tree.c > index 7b823786c2..8b700e9c14 100644 > --- a/log-tree.c > +++ b/log-tree.c > @@ -134,6 +134,7 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid, > int flags, void *cb_data) > { > struct object *obj; > + enum object_type objtype; > enum decoration_type type = DECORATION_NONE; > struct decoration_filter *filter = (struct decoration_filter *)cb_data; > > @@ -155,9 +156,10 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid, > return 0; > } > > - obj = parse_object(the_repository, oid); > - if (!obj) > + objtype = oid_object_info(the_repository, oid, NULL); > + if (type < 0) > return 0; > + obj = lookup_object_by_type(the_repository, oid, objtype); This series looks good. I just wonder if between this and my own lookup_{blob,tree,tag,commit}_type() in [1] whether exposing some function between what we have now in parse_object() and parse_object_buffer() wouldn't also do this for you. I.e. in my patch if you pass a type into parse_object_buffer() I think you'll get the same behavior. To be clear I see nothing wrong with this, it's more of a musing about how some functions in object.c discover the type on their own, others allow passing it in, sometimes (worse before that series of mine) we relay the not-real-but-inferred-type etc. 1. https://lore.kernel.org/git/patch-10.11-a84f670ac24-20210328T021238Z-avarab@xxxxxxxxx/