Re: [PATCH v2 00/17] pack-objects: add --path-walk option for better deltas

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/28/24 1:25 PM, Taylor Blau wrote:
On Mon, Oct 28, 2024 at 01:13:15PM -0400, Derrick Stolee wrote:

You are correct that this is not compatible with those features as-is.
_Maybe_ there is potential to integrate them in the future, but that
would require better understanding of whether the new compression
mechanism valuable in enough cases (final storage size or maybe even
in repacking time).

I think the bitmap thing is not too big of a hurdle. The .bitmap file is
the only spot we store name-hash values on-disk in the "hashcache"
extension.

Unfortunately, there is no easy way to reuse the format of the existing
hashcache extension as-is to indicate to the reader whether they are
recording traditional name-hash values, or the new --path-walk hash
values.

The --path-walk option does not mess with the name-hash. You're thinking
of the --full-name-hash feature [1] that was pulled out due to a lack of
interest (and better results with --path-walk).

[1] https://lore.kernel.org/git/pull.1785.git.1725890210.gitgitgadget@xxxxxxxxx/

At the very least, it would be helpful if some other large repos were
tested to see how commonly this could help client-side users. Are
there other aspects to a repo's structure that could be important to
how effective this approach is?

What measurements are you looking for here? I thought that you had
already done an extensive job of measuring the client-side impact of
pushing smaller packs and faster local repacks, no?
I've done what I can with the repos I know about, but perhaps other
folks have other repos they like to test that might present new
aspects to the problem.

For example, a colleague was testing this in a variety of Javascript
repos and found that the node repo [2] was slightly worse with the
--path-walk option. I've since discovered that this is only true when
using a checked-out copy and the .git/index file is iterated, as some
large source files with few versions become split across the boundary
of "in the index" or "in commit history". (I am fixing this aspect as
well in the next iteration, hence some reason for its delay.)

[2] https://github.com/nodejs/node

Thanks,
-Stolee





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux