Re: [PATCH 1/2] blame: large-scale performance rewrite

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Apr 26, 2014 at 12:48 AM, David Kastrup <dak@xxxxxxx> wrote:
> Shawn Pearce <spearce@xxxxxxxxxxx> writes:
>
>> On Fri, Apr 25, 2014 at 4:56 PM, David Kastrup <dak@xxxxxxx> wrote:
>>> The previous implementation used a single sorted linear list of blame
>>> entries for organizing all partial or completed work.  Every subtask had
>>> to scan the whole list, with most entries not being relevant to the
>>> task.  The resulting run-time was quadratic to the number of separate
>>> chunks.
>>>
>>> This change gives every subtask its own data to work with.  Subtasks are
>>> organized into "struct origin" chains hanging off particular commits.
>>> Commits are organized into a priority queue, processing them in commit
>>> date order in order to keep most of the work affecting a particular blob
>>> collated even in the presence of an extensive merge history.
>>
>> Without reading the code, this sounds like how JGit runs blame.
>>
>>> For large files with a diversified history, a speedup by a factor of 3
>>> or more is not unusual.
>>
>> And JGit was already usually slower than git-core. Now it will be even
>> slower! :-)
>
> If your statement about JGit is accurate, it should likely have beat Git
> for large use cases (where the performance improvements are most
> important) as O(n) beats O(n^2) in the long run.

Agreed.

In a few cases yes, JGit did beat git-core at blame running time.
Unfortunately according to my profiling blame performance is still
dominated by inflation and scanning of commit and tree objects to
identify unmodified blobs and advance to the next scapegoat ancestor.

Its entirely possible my blame implementation in JGit is still doing
something stupid. Or its possible JIT translated Java just takes
longer than natively compiled C. I am including JVM startup and JIT
time in the timing comparison with git-core.

> Apart from the objective measurement of "total time", the more
> subjective impression of interactive/incremental response (like in git
> gui blame) where the order of results will significantly differ (current
> git-blame --incremental focuses on getting blames resolved in
> first-lines-first manner, the proposed git-blame rather works on a
> newest-commits-first basis which might better match typical use cases)
> might be worth reporting.

Seeing this fill during execution was the initial motivation I had for
writing git gui blame. I don't think anyone cares about the order it
displays in. In fact ordering my timestamp may be more what the user
wants anyway, as you suggest above.

Thanks for doing this. Unfortunately I can't read the patch itself as
I am also trying to improve JGit's blame code for $DAY_JOB, and JGit
is BSD licensed.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]