On Thu, Sep 17, 2020 at 10:20:03AM +0200, Rasmus Villemoes wrote: > Hi, > > While poking around the code, I noticed that it seems > ->approximate_object_count_valid is never set to 1, and it never has > been, not even back when it was a global variable. So perhaps it can > just be removed and the logic depending on it simplified? Or am I > missing some preprocessor trickery. > > Nobody seems to have noticed the lack of caching - and actually setting > it to 1 after the count has been computed might be a little dangerous > unless one takes care to invalidate the cache anywhere that might be > relevant. We should be able to construct a test where it matters. The main cost that the flag is overcoming is the iteration through the packs. So we'd want a lot of packs. And the primary place the function would get called a lot is when abbreviating commits. So doing: for i in $(seq 1000); do echo blob echo 'data <<EOF' echo $i echo EOF echo checkpoint done | git -c transfer.unpacklimit=0 fast-import will get us a lot of packs. I tried that in linux.git. And then this should get us a baseline for how much it costs to traverse and print out object names: $ time git rev-list --format=%H HEAD >/dev/null real 0m6.636s user 0m6.492s sys 0m0.144s And now let's see how long it takes with abbreviation: $ time git rev-list --format=%h HEAD >/dev/null real 0m34.518s user 0m34.253s sys 0m0.264s Yow. That's a lot. But part of the cost is that we have to look up each abbreviated hash in each pack to see if it's present there, so we'd expect it to be a lot more expensive. But let's try it with the caching flag: diff --git a/packfile.c b/packfile.c index 9ef27508f2..e69012e7f2 100644 --- a/packfile.c +++ b/packfile.c @@ -923,6 +923,7 @@ unsigned long repo_approximate_object_count(struct repository *r) count += p->num_objects; } r->objects->approximate_object_count = count; + r->objects->approximate_object_count_valid = 1; } return r->objects->approximate_object_count; } $ time git rev-list --format=%h HEAD >/dev/null real 0m29.411s user 0m29.150s sys 0m0.260s Still not great, but caching the count did save us 15%. That seems worth it to me (1000 packs is more than we'd hope for, but not uncommon in a poorly maintained repo). The failure to set the flag is just a bug; looks like mine from 8e3f52d778 (find_unique_abbrev: move logic out of get_short_sha1(), 2016-10-03). You're right that caching runs the risk of the cache being invalidated. But in this case I think we're covered. We'd generally modify packed_git only via reprepare_packed_git(), prepare_packed_git(), and we do reset the flag there. Plus count is only meant to be approximate, so even if it ended up stale within a single process, I don't think it would be that big a deal. -Peff