On Mon, Oct 08, 2018 at 12:57:34PM -0400, Derrick Stolee wrote: > On 10/8/2018 12:41 PM, SZEDER Gábor wrote: > >On Wed, Oct 03, 2018 at 03:18:05PM -0400, Jeff King wrote: > >>I'm still excited about the prospect of a bloom filter for paths which > >>each commit touches. I think that's the next big frontier in getting > >>things like "git log -- path" to a reasonable run-time. > >There is certainly potential there. With a (very) rough PoC > >experiment, a 8MB bloom filter, and a carefully choosen path I can > >achieve a nice, almost 25x speedup: > > > > $ time git rev-list --count HEAD -- t/valgrind/valgrind.sh > > 6 > > > > real 0m1.563s > > user 0m1.519s > > sys 0m0.045s > > > > $ time GIT_USE_POC_BLOOM_FILTER=y ~/src/git/git rev-list --count HEAD -- t/valgrind/valgrind.sh > > 6 > > > > real 0m0.063s > > user 0m0.043s > > sys 0m0.020s > > > > bloom filter total queries: 16269 definitely not: 16195 maybe: 74 false positives: 64 fp ratio: 0.003934 > Nice! These numbers make sense to me, in terms of how many TREESAME queries > we actually need to perform for such a query. Yeah... because you didn't notice that I deliberately cheated :) As it turned out, it's not just about the number of diff queries that we can spare, but, for the speedup _ratio_, it's more about how expensive those diff queries are. git.git has a rather flat hierarchy, and 't/' is the 372th entry in the current root tree object, while 'valgrind/' is the 923th entry, and the diff machinery spends considerable time wading through the previous entries. Notice the "carefully chosen path" remark in my previous email; I think this particular path has the highest number of preceeding tree entries, and, in addition, 't/' changes rather frequently, so the diff machinery often has to scan two relatively big tree objects. Had I chosen 'Documentation/RelNotes/1.5.0.1.txt' instead, i.e. another path two directories deep, but whose leading path components are both near the beginning of the tree objects, the speedup would be much less impressive: 0.282s vs. 0.049s, i.e. "only" ~5.7x instead of ~24.8x. > >But I'm afraid it will take a while until I get around to turn it into > >something presentable... > Do you have the code pushed somewhere public where one could take a look? I > Do you have the code pushed somewhere public where one could take a > look? I could provide some early feedback. Nah, definitely not... I know full well how embarassingly broken this implementation is, I don't need others to tell me that ;)