Jeff King <peff@xxxxxxxx> writes: > So we weren't actually reading all of the commits even under the old > code. We were just going into deref_tag(), seeing that the object is > already parsed, and then quickly returning when we see that we already > have an OBJ_COMMIT. I suspect most of your timing differences are mostly > noise. Noise that contributes largely to ~7% fluctuation? That sounds big. > I guess --input=stdin-commits is a good way to simulate that. Try this > (assuming there's already a split-graph file with all of the commits in > it): > > git rev-parse HEAD >input > time git commit-graph write --input=stdin-commits --split=merge-all <input > ... > A more realistic case would probably be feeding a new small pack to > --input=stdin-packs. Makes sense. > At any rate, I think there is a demonstrable speedup there. But > moreover, I'm pretty sure this existing code is not doing what it > expects: > > /* only add commits if they still exist in the repo */ > result = lookup_commit_reference_gently(ctx->r, &oid, 1); > > That won't look at the object database at all if the commit is already > marked as parsed. And that parsing might have come from the commit graph > itself, as our earlier attempts showed. So switching to a real > has_object_file() call is an important _correctness_ fix, even leaving > aside the speed improvements. True. has_object_file() asks the question without even lookup-replace flag set, so it would try to see if the object is truly there (modulo the "pretended" ones are declared to exist via the cached-object mechanism). Do we need to worry about INFO_QUICK and SKIP_FETCH_OBJECT in this codepath, by the way?