Shawn Pearce <spearce@xxxxxxxxxxx> writes: > On Fri, Apr 25, 2014 at 4:56 PM, David Kastrup <dak@xxxxxxx> wrote: >> The previous implementation used a single sorted linear list of blame >> entries for organizing all partial or completed work. Every subtask had >> to scan the whole list, with most entries not being relevant to the >> task. The resulting run-time was quadratic to the number of separate >> chunks. >> >> This change gives every subtask its own data to work with. Subtasks are >> organized into "struct origin" chains hanging off particular commits. >> Commits are organized into a priority queue, processing them in commit >> date order in order to keep most of the work affecting a particular blob >> collated even in the presence of an extensive merge history. > > Without reading the code, this sounds like how JGit runs blame. > >> For large files with a diversified history, a speedup by a factor of 3 >> or more is not unusual. > > And JGit was already usually slower than git-core. Now it will be even > slower! :-) If your statement about JGit is accurate, it should likely have beat Git for large use cases (where the performance improvements are most important) as O(n) beats O(n^2) in the long run. At any rate, I see that I ended up posting this patch series at the end of the week again which makes for a somewhat lacklustre initial response from those who code Git for a regular living. Apropos: shaking the bugs regarding -M and -C options out of the code had taken a large toll because -M can cause the same or overlapping line regions to be responsible for different target regions and the original code implementing the "straightforward" blame blew up on the overlap. I spent a _lot_ of time tracking down that problem. As I am lousy focusing on more than one task, and as I don't get a regular paycheck anyway, this will have to remain my last contribution to Git if I am not going to recoup my losses. Patch 2 of this series tries giving the community of Git a serious chance at picking that option (I mean, there are literally millions of Git users around with a sizable number profiting) while not being obnoxious about it. My personal guess is that it will fail regarding both objectives. But then I've been surprised before by other free software communities when trying to make those particular two ends meet. At any rate, feedback about the performance of the patch from users disappointed by regular git blame would be welcome. Apart from the objective measurement of "total time", the more subjective impression of interactive/incremental response (like in git gui blame) where the order of results will significantly differ (current git-blame --incremental focuses on getting blames resolved in first-lines-first manner, the proposed git-blame rather works on a newest-commits-first basis which might better match typical use cases) might be worth reporting. -- David Kastrup -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html