Re: [PATCH 1/2] blame: large-scale performance rewrite

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Apr 26, 2014 at 9:50 AM, David Kastrup <dak@xxxxxxx> wrote:
> Shawn Pearce <spearce@xxxxxxxxxxx> writes:
>
>> On Sat, Apr 26, 2014 at 12:48 AM, David Kastrup <dak@xxxxxxx> wrote:
>>> Shawn Pearce <spearce@xxxxxxxxxxx> writes:
>>>>
>>>> And JGit was already usually slower than git-core. Now it will be
>>>> even slower! :-)
>>>
>>> If your statement about JGit is accurate, it should likely have beat
>>> Git for large use cases (where the performance improvements are most
>>> important) as O(n) beats O(n^2) in the long run.
>>
>> Agreed.
>>
>> In a few cases yes, JGit did beat git-core at blame running time.
>> Unfortunately according to my profiling blame performance is still
>> dominated by inflation and scanning of commit and tree objects to
>> identify unmodified blobs and advance to the next scapegoat ancestor.
>
> Oh, the C version is most certainly significantly impacted by that after
> my patch.  One can _significantly_ speed it up by increasing
> core.deltaBaseCacheLimit from its rather silly value of 16M.  If you
> have a comparable control in JGit and if your benchmarking points to the
> unpacking, that's where I'd suggest tweaking first.

Good point, we have that control but I always forget to play with it
during benchmarking.

>>> Apart from the objective measurement of "total time", the more
>>> subjective impression of interactive/incremental response (like in
>>> git gui blame) where the order of results will significantly differ
>>> (current git-blame --incremental focuses on getting blames resolved
>>> in first-lines-first manner, the proposed git-blame rather works on a
>>> newest-commits-first basis which might better match typical use
>>> cases) might be worth reporting.
>>
>> Seeing this fill during execution was the initial motivation I had for
>> writing git gui blame.
>
> It does not look impressively better to me, actually.  Probably because
> "git gui blame" is running "git blame" several times.

IIRC git gui blame runs blame only twice. Once with the simple blame,
and then again with -M -C or something like that. Its a nice display
because you can see who moved code here, and then who originally wrote
it. Unfortunately it has to be done as two passes as the blame
implementation does not know how to compute both in one pass.

The move/copy detection is also computationally more expensive per
revision visited, so it really slows down the simple "who put it here"
if the two passes were somehow run in a single iteration.

>> I don't think anyone cares about the order it displays in. In fact
>> ordering my timestamp may be more what the user wants anyway, as you
>> suggest above.
>
> What the user wants anyway is that "git gui blame" notifies "git blame"
> of the currently displayed window area whenever that changes, and that
> git blame then _first_ deals with all chunks with a visible on-screen
> part.

That is a really good idea. Unfortunately finding that part of the
window in the blame data can still take a while. You may compute other
parts of the file anyway while looking for this part, as you had to
compare two revisions and scapegoat sections to find out that commit
didn't touch the interesting region and keep going back through
history.

You may get lucky and be able to fill in a recent region quickly, but
I somehow suspect giving priority to the visible region in the
priority queue won't really help to reduce latency of the visible
region for the user.

> The really tricky/expensive part was realizing that as opposed to target
> line number ranges, source line number ranges may overlap and/or be
> duplicate when using -M or -C options.  That really messed things up for
> a long time and was hard to debug.  Once I figured out what was going
> wrong, recoding the respective stuff was straightforward.

Right, and JGit blame still is missing the -M and -C options, as I
have not implemented those yet. I got basic blame and reverse blame
working a few years ago and then stopped working on the code for a
while. Now we have interest in improving the latency for $DAY_JOB, so
I've been poking at the code again for the last week or so.

But that -M and -C thing is still not implemented, and I know its
going to be tricky to get right with the way the scapegoating is
passed along.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]