"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > ... deltas across path boundaries. This second pass is much faster than a fresh > pass since the existing deltas are used as a limit for the size of > potentially new deltas, short-circuiting the checks when the delta size > exceeds the current-best. Very nice. > The microsoft/fluentui is a public Javascript repo that suffers from many of > the name hash collisions as internal repositories I've worked with. Here is > a comparison of the compressed size and end-to-end time of the repack: > > Repack Method Pack Size Time > --------------------------------------- > Hash v1 439.4M 87.24s > Hash v2 161.7M 21.51s > Path Walk 142.5M 28.16s > > > Less dramatic, but perhaps more standardly structured is the nodejs/node > repository, with these stats: > > Repack Method Pack Size Time > ------------------------------------------ > Hash v1 739.9M 71.18s > Hash v2 764.6M 67.82s > Path Walk 698.0M 75.10s > > > Even the Linux kernel repository gains some benefits, even though the number > of hash collisions is relatively low due to a preference for short > filenames: > > Repack Method Pack Size Time > ------------------------------------------ > Hash v1 2.5G 554.41s > Hash v2 2.5G 549.62s > Path Walk 2.2G 559.00s This third one, v2 not performing much better than v1, is quite surprising. > The drawbacks of the --path-walk feature is that it will be harder to > integrate it with bitmap features, specifically delta islands. This is not > insurmountable, but would require more work, such as a revision walk to > paint objects with reachability information before using that during delta > computations. > > However, there should still be significant benefits to Git clients trying to > save space and improve local performance. Sure. More experiments and more approaches will eventually give us overall improvement. I am hoping that we will be able to condense the result of these different approaches and their combinations into easy-to-choose-from canned choices (as opposed to a myriad of little knobs the users need to futz with without really understanding what they are tweaking). > This feature was shipped with similar features in microsoft/git as of > v2.47.0.vfs.0.3 [4]. This was used in CI machines for an internal monorepo > that had significant repository growth due to constructing a batch of > beachball [5] CHANGELOG.[md|json] files and pushing them to a release > branch. These pushes were frequently 70-200 MB due to poor delta > compression. Using the 'pack.usePathWalk=true' config, these pushes dropped > in size by 100x while improving performance. Since these CI machines were > working with a shallow clone, the 'edge_aggressive' changes were required to > enable the path-walk option. Nice, thanks.