On Wed, Jun 30, 2021 at 01:45:03AM -0400, Jeff King wrote: > That implies to me that yes, it really is saving time, especially in the > cold-cache case. But if you have to do any actual fill-in traversal, the > benefits get totally lost in the noise. _Especially_ in the cold-cache > case, because then we're faulting in the actual object data, etc. That's definitely true. I would say that any patches in this direction would have the general sense of "this helps in some cases where we don't have to do much traversal by eliminating an unnecessarily eager part of loading bitmaps, and does not make anything worse when the bitmap coverage is incomplete (and requires traversing)." I would add that these effects change with the size of the bitmap. Let's just consider the "count the number of objects in a bitmapped commit". On my local copy of the kernel, I see a relatively modest improvement: $ tip=2ab38c17aac10bf55ab3efde4c4db3893d8691d2 $ hyperfine \ 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $tip' \ 'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $tip' \ --warmup=3 Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $tip Time (mean ± σ): 21.5 ms ± 5.6 ms [User: 8.7 ms, System: 12.7 ms] Range (min … max): 12.4 ms … 34.2 ms 170 runs Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $tip Time (mean ± σ): 10.6 ms ± 1.6 ms [User: 7.1 ms, System: 3.5 ms] Range (min … max): 4.5 ms … 11.9 ms 258 runs but on my copy of the kernel's fork network repo (that containing all of torvalds/linux's objects, as well as all of its fork's objects, too), the magnitude of the effect is much bigger: Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $tip Time (mean ± σ): 332.3 ms ± 12.6 ms [User: 210.4 ms, System: 121.8 ms] Range (min … max): 322.7 ms … 362.4 ms 10 runs Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $tip Time (mean ± σ): 260.0 ms ± 9.3 ms [User: 191.0 ms, System: 69.0 ms] Range (min … max): 250.4 ms … 272.8 ms 11 runs That's a more modest 1.28x improvement (versus 2.03x in just linux.git), but the overall magnitude is much bigger. This is basically an effect of the bitmaps themselves. In the latter example, there are more bitmaps (around 1.6k of them, versus just over 500 in my copy of just the kernel), and each of them are much wider (because there are far more objects, 40.2M versus just 7.8M). So there is more work to do, and the page cache is less efficient for us because we can't fit as much of the .bitmap file in the page cache at once. > By the way, one other thing I noticed is that having a fully-build > commit-graph also made a big difference (much bigger than this patch), > especially for the non-bitmapped commit. Which makes sense, since it is > saving us from loading commit objects from disk during fill-in > traversal. Likewise having an reverse index helps a lot, too. That radix sort scales linearly with the number of objects in the bitmapped pack (plus you're paying the cost to allocate more heap, etc). This clouded up some of my timings in p5310, which made me think that it would be a good idea to `git config pack.writeReverseIndex true` in the setup for those tests, but an even better direction would be to change the default of pack.writeReverseIndex to true everywhere. > So I dunno. There's absolutely savings for some cases, but I suspect in > practice it's not going to really be noticeable. Part of me says "well, > if it ever provides a benefit and there isn't a downside, why not?". So > just devil's advocating on downsides for a moment: > > - there's some extra complexity in the file format and code to read > and write these (and still fall back to the old system when they're > absent). I don't think it's a deal-breaker, as it's really not that > complicated a feature. I agree with both of these. The complexity is manageable, I think, especially since I dropped support for the extended offset table (having a bitmap file that is >2GiB seems extremely unlikely to me, and it's possible to add support for it in the future) and fanout table (there are usually less than <1k commits with bitmaps, so a 256-entry fanout table doesn't seem to help much in benchmarking). So what's left of the format is really just: - a table of object id's - a table of (uint32_t, uint32_t) tuples describing the (short) offset of the bitmap, and an index position of the xor'd bitmap (if one exists). > - we're using extra bytes on disk (and the associated cold block cache > effects there). It's not very many bytes, though (I guess 20 for the > hash, plus a few offset integers; if we wanted to really > penny-pinch, we could probably store 32-bit pointers to the hashes > in the associated .idx file, at the cost of an extra level of > indirection while binary searching). But that is only for a few > hundred commits that are bitmapped, not all of them. And it's > balanced by not needing to allocate a similar in-memory lookup table > in each command. So it's probably a net win. Worth benchmarking, at least. I'll be offline for the next ~week and a half for my wedding, but I'll post some patches to the list shortly after I get back. Thanks, Taylor