Re: [PATCH 1/2] blame: large-scale performance rewrite

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Shawn Pearce <spearce@xxxxxxxxxxx> writes:

> On Sat, Apr 26, 2014 at 12:48 AM, David Kastrup <dak@xxxxxxx> wrote:
>> Shawn Pearce <spearce@xxxxxxxxxxx> writes:
>>>
>>> And JGit was already usually slower than git-core. Now it will be
>>> even slower! :-)
>>
>> If your statement about JGit is accurate, it should likely have beat
>> Git for large use cases (where the performance improvements are most
>> important) as O(n) beats O(n^2) in the long run.
>
> Agreed.
>
> In a few cases yes, JGit did beat git-core at blame running time.
> Unfortunately according to my profiling blame performance is still
> dominated by inflation and scanning of commit and tree objects to
> identify unmodified blobs and advance to the next scapegoat ancestor.

Oh, the C version is most certainly significantly impacted by that after
my patch.  One can _significantly_ speed it up by increasing
core.deltaBaseCacheLimit from its rather silly value of 16M.  If you
have a comparable control in JGit and if your benchmarking points to the
unpacking, that's where I'd suggest tweaking first.

>> Apart from the objective measurement of "total time", the more
>> subjective impression of interactive/incremental response (like in
>> git gui blame) where the order of results will significantly differ
>> (current git-blame --incremental focuses on getting blames resolved
>> in first-lines-first manner, the proposed git-blame rather works on a
>> newest-commits-first basis which might better match typical use
>> cases) might be worth reporting.
>
> Seeing this fill during execution was the initial motivation I had for
> writing git gui blame.

It does not look impressively better to me, actually.  Probably because
"git gui blame" is running "git blame" several times.

> I don't think anyone cares about the order it displays in. In fact
> ordering my timestamp may be more what the user wants anyway, as you
> suggest above.

What the user wants anyway is that "git gui blame" notifies "git blame"
of the currently displayed window area whenever that changes, and that
git blame then _first_ deals with all chunks with a visible on-screen
part.

> Thanks for doing this. Unfortunately I can't read the patch itself as
> I am also trying to improve JGit's blame code for $DAY_JOB, and JGit
> is BSD licensed.

Shrug.  The patch is functionally equivalent to the previous behavior,
the arrangement of linear lists on underlying data structures is hardly
copyrightable, and Java implements linear lists differently anyhow.
Merging two sorted linear lists is a straightforward operation,
splitting a linear list into several others also is.

The really tricky/expensive part was realizing that as opposed to target
line number ranges, source line number ranges may overlap and/or be
duplicate when using -M or -C options.  That really messed things up for
a long time and was hard to debug.  Once I figured out what was going
wrong, recoding the respective stuff was straightforward.

I doubt that there is much copyrightable material to transfer as I seem
to remember that Java does not have anything like a pointer.  So the
main stuff, juggling with linear lists, would not likely transfer in a
reasonably recognizable manner.

-- 
David Kastrup
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]