On Wed, Feb 05, 2020 at 10:56:19PM +0000, Garima Singh via GitGitGadget wrote: > Hey! > > The commit graph feature brought in a lot of performance improvements across > multiple commands. However, file based history continues to be a performance > pain point, especially in large repositories. > > Adopting changed path bloom filters has been discussed on the list before, > and a prototype version was worked on by SZEDER Gábor, Jonathan Tan and Dr. > Derrick Stolee [1]. This series is based on Dr. Stolee's proof of concept in > [2] > > Performance Gains: We tested the performance of git log -- path on the git > repo, the linux repo and some internal large repos, with a variety of paths > of varying depths. > > On the git and linux repos: We observed a 2x to 5x speed up. > > On a large internal repo with files seated 6-10 levels deep in the tree: We > observed 10x to 20x speed ups, with some paths going up to 28 times faster. > > Future Work (not included in the scope of this series): > > 1. Supporting multiple path based revision walk > 2. Adopting it in git blame logic. > 3. Interactions with line log git log -L > > > ---------------------------------------------------------------------------- > > Updates since the last submission > > * Removed all the RFC callouts, this is a ready for full review version Don't know when I'll find enough time to properly review the series. maybe someday... > * Added unit tests for the bloom filter computation layer This fails on big endian, e.g. in Travis CI's s390x build: https://travis-ci.org/szeder/git-cooking-topics-for-travis-ci/jobs/647253022#L2210 (The link highlights the failure, but I'm afraid your browser won't jump there right away; you'll have to click on the print-test-failures fold at the bottom, and scroll down a bit...)