On 4/11/2023 5:40 PM, Taylor Blau wrote: > On Tue, Apr 11, 2023 at 09:54:08AM -0400, Derrick Stolee wrote: >> On 4/10/2023 6:53 PM, Taylor Blau wrote: >>> In the vast majority of cases, this trade-off favors the on-disk ".rev" >>> files. But in certain cases, the in-memory variant performs more >>> favorably. Since these cases are narrow, and performance is machine- and >>> repository-dependent, this series also introduces a new configuration >>> option to disable reading ".rev" files in the third commit. >> >> I agree that the performance trade-off indicates that having the .rev >> files is preferred. It makes operations that _can_ be very fast as fast >> as possible (inspecting a small number of objects is much faster because >> we don't generate the in-memory index in O(N) time) and you create a knob >> for disabling it in the case that we are already doing something that >> inspects almost-all objects. > > Sweet; I'm glad that you agree. > > FWIW for non-GitHub folks, observing a slow-down here has never been an > issue for us. So much so that I wrote the pack.readReverseIndex knob > yesterday for the purpose of sending this series. > > That said, I think that including it here is still worthwhile, since the > cases where performance really suffers (e.g., `git cat-file > --batch-all-objects --batch-check='%(objectsize:disk)'`) isn't something > that GitHub runs regularly if at all. > > To accommodate different workflows, I think having the option to opt-out > of reading the on-disk ".rev" files is worthwhile. The only thing I can think of that would actually use this kind of behavior is git-sizer, but even that doesn't actually report the on-disk size (yet) and instead inflates all deltas when reporting size counts. The difference in performance here is likely minimal for that tool. >> This was an easy series to read. I applied the patches locally and poked >> around in the resulting code as I went along. This led to a couple places >> where I recommend a few changes, including a new patch that wires >> repository pointers through a few more method layers. > > Thanks for taking a look. Based on your review, there are only a couple > of things on my mind: > > - I amended the last patch to more clearly state when we would want to > run the suite GIT_TEST_NO_WRITE_REV_INDEXES=1 set, and kept it in > the linux-TEST-vars configuration. I think this is the right thing to do. Thanks. > - How do you want to handle that extra patch? As I noted in [1], I > think squashing the two together in one way or another makes sense. > So really we have to figure out (a) if you think that is the right > way to go, and (b) if so, how to handle attribution / the commit > message. Squashing makes sense. You could make me a co-author, or not. It's the natural thing to do once the problem to solve is identified. Thanks, -Stolee