Ok, this is a rough first draft of avoiding the topological sort up-front, and instead incrementally sorting only when actually necessary. The first patch is a pure cleanup. Quite frankly, I suspect the old topological sorting code would have worked perfectly fine as-is, but I couldn't really bear to watch it or think about debugging it the way it was before. The indirection and other strangeness just blew my tiny little mind, and while I bet it had some historical reason, I ended up reworking that thing. I tried to keep it as similar as humanly possible to the old code (because it's easy to get wrong, and because I really didn't want to worry about the toposort itself), but the numbers speak for themselves: 4 files changed, 55 insertions(+), 126 deletions(-) should tell you something. And it means that there are no subtle calling issues with preconditions for using the toposort etc - you can use it over and over again, and it should all be obvious. Knock wood. The second diff is the one that actually adds the new feature, and it undoes most of the nice statistics of the first one: 5 files changed, 70 insertions(+), 12 deletions(-) but we still end up with more deletions than insertions on the whole, *and* a new feature. The new code is triggered by using the "--replay" flag, which will cause certain consistency checks to be done when a commit is shown by the log machinery. In particular: - if we print out a parent SHA1, and the parent has already been shown, that's a topology violation, and causes a replay. - if we turn a commit unintersting, and the commit has already been shown, that's a "uninteresting" violation, and causes a replay. The second one I didn't test at all. It's probably hard to trigger, and I bet there are bugs there, but this is very much a WIP patch, with the hope that people other than me can work on it. When a replay happens, the log code will print out Replay <sha1> for each <sha1> commit that gets invalidated by the replay, and then start *that* part of history anew, with just the required part re-sorted. Should it print just the number of commits? Perhaps. Play around with this. Anyway, to give you some kind of idea of the effect of this, in the current git tree as it exists in my repo, I get 15 replay events, and if you compare the total log output, you see: [torvalds@woody git]$ wc -l t1 t2 170191 t1 178150 t2 where "t1" is without replays, and "t2" is with replays. The replays obviously do add lines (the replayed ones), but at least for git, it's on the order of 5% lines replayed. The big difference is in the latency: time git log --parents --topo-order | head real 0m0.163s vs time git log --parents --replay | head real 0m0.003s ie you can see how the --replay thing starts outputtig commits immediately, because it knows it can just back up and fix any errors that happen. That's the good news. The bad news is that it doesn't work well in this simplistic form, because there is a O(n**2) behaviour when replays *do* happen, ie we end up having replays within replays, and rather than getting a 5% increase like for git itself, for the kernel archive this gets a roughly 50x increase for the replay. So the *latency* still improves dramatically (getting the first one hundred commits in 0.006 seconds vs 1.076s), but because of the bad behaviour wrt cascading replays, it's not really usable in this form (the full log goes from 8 seconds to 22s when writing to /dev/null - and is much worse if the receiver actually has to do something about it). I think the right thing to do wrt this all would be to turn the replay logic into a latency-based one, where it would basically batch things up, but make sure to output something at least every half a second or so. But that's a pretty separate set of logic, so I thought I'd send this out as-is for comments.. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html