Re: [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference when merging

Junio C Hamano <gitster@xxxxxxxxx> · Sat, 21 Mar 2020 11:50:47 -0700

Jeff King <peff@xxxxxxxx> writes:

> So we weren't actually reading all of the commits even under the old
> code. We were just going into deref_tag(), seeing that the object is
> already parsed, and then quickly returning when we see that we already
> have an OBJ_COMMIT. I suspect most of your timing differences are mostly
> noise.

Noise that contributes largely to ~7% fluctuation?  That sounds big.

> I guess --input=stdin-commits is a good way to simulate that. Try this
> (assuming there's already a split-graph file with all of the commits in
> it):
>
>   git rev-parse HEAD >input
>   time git commit-graph write --input=stdin-commits --split=merge-all <input
> ...
> A more realistic case would probably be feeding a new small pack to
> --input=stdin-packs.

Makes sense.

> At any rate, I think there is a demonstrable speedup there. But
> moreover, I'm pretty sure this existing code is not doing what it
> expects:
>
>   /* only add commits if they still exist in the repo */
>   result = lookup_commit_reference_gently(ctx->r, &oid, 1);
>
> That won't look at the object database at all if the commit is already
> marked as parsed. And that parsing might have come from the commit graph
> itself, as our earlier attempts showed. So switching to a real
> has_object_file() call is an important _correctness_ fix, even leaving
> aside the speed improvements.

True. has_object_file() asks the question without even
lookup-replace flag set, so it would try to see if the object is
truly there (modulo the "pretended" ones are declared to exist via
the cached-object mechanism).

Do we need to worry about INFO_QUICK and SKIP_FETCH_OBJECT in this
codepath, by the way?