On Tue, Aug 03, 2021 at 02:56:49PM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@xxxxxx> writes: > > > I wonder what our stance on this is. I can definitely understand the > > angle that this would be a deal breaker given that we now claim commits > > exist which don't anymore. > > An optimization that produces a wrong result very fast is a useless > optimization that has no place in our codebase. But don't we have > some clue recorded in the commit graph file that tells us with what > packfile the graph is to be used (iow, if the named packfile still > exists there, the objects recorded in the graph file are to be found > there) or something? Unfortunately, no. For bitmaps we have this information given that a bitmap is tied to a specific pack anyway. But for commit-graphs, the story is different given that they don't really care about the packs per se, but only about the commits. I was briefly wondering whether we can somehow use generation numbers to cut off parsing some commits: given we have already observed a commit with generation number N and we have determined that this commit's object exists, and we now see a commit with generation number M with M<N, then we can skip the object lookup because M is reachable by N and thus its object must exist. But generation numbers cannot determine reachability, but only unreachability, so I fear this is not possible. We can do the following on top though: diff --git a/revision.c b/revision.c index 3527ef3f65..9e62de20ab 100644 --- a/revision.c +++ b/revision.c @@ -368,6 +368,8 @@ static struct object *get_reference(struct rev_info *revs, const char *name, object = NULL; goto out; } + } else if (!repo_has_object_file(revs->repo, oid)) { + die("bad object %s", name); } } We assert that the object exists, but `repo_has_object_file()` won't try to unpack the object header given that we request no info about the object. And because the object ID has been part of the commit-graph, we know that it's a commit. It's a bit slower compared to the version where we don't assert object existence, but still a lot faster compared to looking up the object type via the ODB: Benchmark #1: HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $(cat newrev) Time (mean ± σ): 4.512 s ± 0.057 s [User: 4.131 s, System: 0.381 s] Range (min … max): 4.435 s … 4.632 s 10 runs Benchmark #2: without-existence rev-list --unsorted-input --objects --quiet --not --all --not $(cat newrev) Time (mean ± σ): 2.903 s ± 0.022 s [User: 2.533 s, System: 0.369 s] Range (min … max): 2.878 s … 2.954 s 10 runs Benchmark #3: with-existence: rev-list --unsorted-input --objects --quiet --not --all --not $(cat newrev) Time (mean ± σ): 3.071 s ± 0.014 s [User: 2.712 s, System: 0.358 s] Range (min … max): 3.050 s … 3.088 s 10 runs Summary 'without-existence: rev-list --unsorted-input --objects --quiet --not --all --not $(cat newrev)' ran 1.06 ± 0.01 times faster than 'with-existance: rev-list --unsorted-input --objects --quiet --not --all --not $(cat newrev)' 1.55 ± 0.02 times faster than 'HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $(cat newrev)' Patrick
Attachment:
signature.asc
Description: PGP signature