Re: [PATCH 0/7] pack-revindex: enable on-disk reverse indexes by default

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/11/2023 5:40 PM, Taylor Blau wrote:
> On Tue, Apr 11, 2023 at 09:54:08AM -0400, Derrick Stolee wrote:
>> On 4/10/2023 6:53 PM, Taylor Blau wrote:
>>> In the vast majority of cases, this trade-off favors the on-disk ".rev"
>>> files. But in certain cases, the in-memory variant performs more
>>> favorably. Since these cases are narrow, and performance is machine- and
>>> repository-dependent, this series also introduces a new configuration
>>> option to disable reading ".rev" files in the third commit.
>>
>> I agree that the performance trade-off indicates that having the .rev
>> files is preferred. It makes operations that _can_ be very fast as fast
>> as possible (inspecting a small number of objects is much faster because
>> we don't generate the in-memory index in O(N) time) and you create a knob
>> for disabling it in the case that we are already doing something that
>> inspects almost-all objects.
> 
> Sweet; I'm glad that you agree.
> 
> FWIW for non-GitHub folks, observing a slow-down here has never been an
> issue for us. So much so that I wrote the pack.readReverseIndex knob
> yesterday for the purpose of sending this series.
> 
> That said, I think that including it here is still worthwhile, since the
> cases where performance really suffers (e.g., `git cat-file
> --batch-all-objects --batch-check='%(objectsize:disk)'`) isn't something
> that GitHub runs regularly if at all.
> 
> To accommodate different workflows, I think having the option to opt-out
> of reading the on-disk ".rev" files is worthwhile.

The only thing I can think of that would actually use this kind of
behavior is git-sizer, but even that doesn't actually report the on-disk
size (yet) and instead inflates all deltas when reporting size counts. The
difference in performance here is likely minimal for that tool.
 
>> This was an easy series to read. I applied the patches locally and poked
>> around in the resulting code as I went along. This led to a couple places
>> where I recommend a few changes, including a new patch that wires
>> repository pointers through a few more method layers.
> 
> Thanks for taking a look. Based on your review, there are only a couple
> of things on my mind:
> 
>   - I amended the last patch to more clearly state when we would want to
>     run the suite GIT_TEST_NO_WRITE_REV_INDEXES=1 set, and kept it in
>     the linux-TEST-vars configuration.

I think this is the right thing to do. Thanks.

>   - How do you want to handle that extra patch? As I noted in [1], I
>     think squashing the two together in one way or another makes sense.
>     So really we have to figure out (a) if you think that is the right
>     way to go, and (b) if so, how to handle attribution / the commit
>     message.

Squashing makes sense. You could make me a co-author, or not. It's the
natural thing to do once the problem to solve is identified.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux