On Fri, Dec 20, 2019 at 10:05:11PM +0000, Garima Singh via GitGitGadget wrote: > Adopting changed path bloom filters has been discussed on the list before, > and a prototype version was worked on by SZEDER Gábor, Jonathan Tan and Dr. > Derrick Stolee [1]. This series is based on Dr. Stolee's approach [2] and > presents an updated and more polished RFC version of the feature. Great to see progress here. I probably won't have time to review this carefully before the new year, but I did notice some low-hanging fruit on the generation side. So here are a few patches to reduce the CPU and memory usage. They could be squashed in at the appropriate spots, or perhaps taken as inspiration if there are better solutions (especially for the first one). I think we could go further still, by actually doing a non-recursive diff_tree_oid(), and then recursing into sub-trees ourselves. That would save us having to split apart each path to add the leading paths to the hashmap (most of which will be duplicates if the commit touched "a/b/c" and "a/b/d", etc). I doubt it would be that huge a speedup though. We have to keep a list of the touched paths anyway (since the bloom key parameters depend on the number of entries), and most of the time is almost certainly spent inflating the trees in the first place. However it might be easier to follow the code, and it would make it simpler to stop traversing at the 512-entry limit, rather than generating a huge diff only to throw it away. [1/3]: commit-graph: examine changed-path objects in pack order [2/3]: commit-graph: free large diffs, too [3/3]: commit-graph: stop using full rev_info for diffs bloom.c | 18 +++++++++--------- commit-graph.c | 34 +++++++++++++++++++++++++++++++++- 2 files changed, 42 insertions(+), 10 deletions(-) -Peff