"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > Subject: Re: [PATCH 3/3] blame: use changed-path Bloom filters Ahh, I almost forgot we spell Bloom with capital B, so I should go back and amend the title of [2/3]. > When computing a blame, there is a section in find_origin() that > computes a diff between a commit and one of its parents. When this > is the first parent, we can check the Bloom filters before calling > diff_tree_oid(). > > In order to make this work with the blame machinery, we need to > initialize a struct bloom_key with the initial path. But also, we > need to add more keys to a list if a rename is detected. We then > check to see if _any_ of these keys answer "maybe" in the diff. Yes. I think after gaining experience with this technique, we may be able to speed up the "git log --follow" while correcting its semantics at the same time. The prospect is unnervingly exciting. > Generally, this is a performance enhancement and should not > change the behavior of 'git blame' in any way. Absolutely. > The lack of improvement for the MAINTAINERS file and the relatively > modest improvement for the other examples can be easily explained. > The blame machinery needs to compute line-level diffs to determine > which lines were changed by each commit. That makes up a large > proportion of the computation time, and this change does not > attempt to improve on that section of the algorithm. The > MAINTAINERS file is large and changed often, so it takes time to > determine which lines were updated by which commit. In contrast, > the code files are much smaller, and it takes longer to comute > the line-by-line diff for a single patch on the Linux mailing > lists. Yup, tree-diff for a deeper path would benefit the most, and your numbers were indeed impressive. > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > --- > blame.c | 139 ++++++++++++++++++++++++++++++++++++++++++++---- > blame.h | 6 +++ > builtin/blame.c | 10 ++++ > 3 files changed, 146 insertions(+), 9 deletions(-) I am kind-a surprised how little additional code it takes to do this. Good job.