Re: [PATCH 1/2] blame: large-scale performance rewrite

Shawn Pearce <spearce@xxxxxxxxxxx> · Fri, 25 Apr 2014 17:53:31 -0700

On Fri, Apr 25, 2014 at 4:56 PM, David Kastrup <dak@xxxxxxx> wrote:
> The previous implementation used a single sorted linear list of blame
> entries for organizing all partial or completed work.  Every subtask had
> to scan the whole list, with most entries not being relevant to the
> task.  The resulting run-time was quadratic to the number of separate
> chunks.
>
> This change gives every subtask its own data to work with.  Subtasks are
> organized into "struct origin" chains hanging off particular commits.
> Commits are organized into a priority queue, processing them in commit
> date order in order to keep most of the work affecting a particular blob
> collated even in the presence of an extensive merge history.

Without reading the code, this sounds like how JGit runs blame.

> For large files with a diversified history, a speedup by a factor of 3
> or more is not unusual.

And JGit was already usually slower than git-core. Now it will be even
slower! :-)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html