Re: [PATCH 3/3] blame: use changed-path Bloom filters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> Subject: Re: [PATCH 3/3] blame: use changed-path Bloom filters

Ahh, I almost forgot we spell Bloom with capital B, so I should go
back and amend the title of [2/3].

> When computing a blame, there is a section in find_origin() that
> computes a diff between a commit and one of its parents. When this
> is the first parent, we can check the Bloom filters before calling
> diff_tree_oid().
>
> In order to make this work with the blame machinery, we need to
> initialize a struct bloom_key with the initial path. But also, we
> need to add more keys to a list if a rename is detected. We then
> check to see if _any_ of these keys answer "maybe" in the diff.

Yes.  I think after gaining experience with this technique, we may
be able to speed up the "git log --follow" while correcting its
semantics at the same time.  The prospect is unnervingly exciting.

> Generally, this is a performance enhancement and should not
> change the behavior of 'git blame' in any way.

Absolutely.

> The lack of improvement for the MAINTAINERS file and the relatively
> modest improvement for the other examples can be easily explained.
> The blame machinery needs to compute line-level diffs to determine
> which lines were changed by each commit. That makes up a large
> proportion of the computation time, and this change does not
> attempt to improve on that section of the algorithm. The
> MAINTAINERS file is large and changed often, so it takes time to
> determine which lines were updated by which commit. In contrast,
> the code files are much smaller, and it takes longer to comute
> the line-by-line diff for a single patch on the Linux mailing
> lists.

Yup, tree-diff for a deeper path would benefit the most, and your
numbers were indeed impressive.

> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> ---
>  blame.c         | 139 ++++++++++++++++++++++++++++++++++++++++++++----
>  blame.h         |   6 +++
>  builtin/blame.c |  10 ++++
>  3 files changed, 146 insertions(+), 9 deletions(-)

I am kind-a surprised how little additional code it takes to do
this.  Good job.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux