On 12/22/2019 4:30 AM, Jeff King wrote: > On Fri, Dec 20, 2019 at 10:05:11PM +0000, Garima Singh via GitGitGadget wrote: > >> Adopting changed path bloom filters has been discussed on the list before, >> and a prototype version was worked on by SZEDER Gábor, Jonathan Tan and Dr. >> Derrick Stolee [1]. This series is based on Dr. Stolee's approach [2] and >> presents an updated and more polished RFC version of the feature. > > Great to see progress here. I probably won't have time to review this > carefully before the new year, but I did notice some low-hanging fruit > on the generation side. > > So here are a few patches to reduce the CPU and memory usage. They could > be squashed in at the appropriate spots, or perhaps taken as inspiration > if there are better solutions (especially for the first one). > > I think we could go further still, by actually doing a non-recursive > diff_tree_oid(), and then recursing into sub-trees ourselves. That would > save us having to split apart each path to add the leading paths to the > hashmap (most of which will be duplicates if the commit touched "a/b/c" > and "a/b/d", etc). I doubt it would be that huge a speedup though. We > have to keep a list of the touched paths anyway (since the bloom key > parameters depend on the number of entries), and most of the time is > almost certainly spent inflating the trees in the first place. However > it might be easier to follow the code, and it would make it simpler to > stop traversing at the 512-entry limit, rather than generating a huge > diff only to throw it away. Thanks for these improvements. This diff machinery is new to us (Garima and myself). Here are some recommendations (to Garima) for how to proceed with these patches. Please let me know if anyone disagrees. > [1/3]: commit-graph: examine changed-path objects in pack order This one is best kept as its own patch, as it shows a clear reason why we want to do the sort-by-position. It would also be a complicated patch to include this logic along with the first use of compute_bloom_filters(). > [2/3]: commit-graph: free large diffs, too This one seems best to squash into "commit-graph: write changed paths bloom filters" with a Helped-by for Peff. > [3/3]: commit-graph: stop using full rev_info for diffs While I appreciate the clear benefit in the commit-message here, it may be best to also squash this one similarly. Of course, if we create our own diff logic with the short-circuit capability, then perhaps these patches become obsolete. I'll spend a little time playing with options here. Thanks! -Stolee