Re: [PATCH v2 3/4] read-cache: use hashfile instead of git_hash_ctx

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/17/2021 6:13 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
> 
>> ...
>> mult-pack-indexes, and commit-graphs. Therefore, it seems prudent to
> 
> multi-pack, I would say.
> 
>> There are still some remaining: the extension headers are hashed for use
> 
> some remaining what?  I first read an unwritten word as "issues",
> but I think the answer is "uses of git_hash_ctx".

Thanks for pointing these out. I will fix them.

>> in the End of Index Entries (EOIE) extension. This use of the
>> git_hash_ctx is left as-is. There are multiple reasons to not use a
>> hashfile here, including ...
> 
>> In addition to the test suite passing, I computed indexes using the
>> previous binaries and the binaries compiled after this change, and found
>> the index data to be exactly equal. Finally, I did extensive performance
>> testing of "git update-index --force-write" on repos of various sizes,
>> including one with over 2 million paths at HEAD. These tests
>> demonstrated less than 1% difference in behavior, so the performance
>> should be considered identical.
> 
> Hmph, does that mean 128k buffer is overkill and if we wanted to
> unify the buffer sizes we should have used 8k instead?

The buffer was previously increased to 128k because it makes a
difference in performance when writing the index.

The thing I'm measuring here is the difference between the old
writing code and the new hashfile code. Using the hashfile API
(with an identical buffer size) does not have a meaningful
performance impact, as it should.

I can make this clearer.

> Wait, the removal of fsync has made things faster in general, hasn't
> it?  Did something else degrade performance to cancel that gain?

Are you thinking about [1], which originally was talking about a
change to fsync() calls, but really ended up just making the same
behavior more readable?

[1] https://lore.kernel.org/git/pull.914.v2.git.1616762291574.gitgitgadget@xxxxxxxxx/

I was focused on that because I had initially seen a performance
degradation when I did this refactor. It turns out that my measurements
were not robust enough to the noise, which has been remedied.

> The patch looks an obvious improvement.  What was open-coded in
> longhand is now a well structured series of API calls and the result
> is much easier to follow and maintain.

That is the goal. I'm glad you agree.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux