On Sat, 27 Jan 2007, Jakub Narebski wrote: > > By the way, in git-blame you can also give the cutoff like in git-log; > the lines which come from outside given revision range either get blamed > on boundary, or are shown "unblamed". Well, that's not actually all that useful. It's ok when nothing else works, but it would be nicer if it just acted well "by default". Because You generally don't know a priori where your point of interest lies. This is why "git log -p" (or "git whatchanged" before it) is so nice. A streaming format means that you get the stuff you likely care about soon, but if you aren't quite sure about where it was, it will come _eventually_ as you page down. And you can decide at any point in the middle that "ok, the thing I was looking for is obviously ancient", which may end up changing your whole outlook on a problem. Which is why I think "incremental" things are so important. In Sydney at Linux.conf.au I talked a bit to Paul Mackerras about gitk, and gitk is _fairly_ good at doing things incrementally (and apparently it is internally better at it than I have realized), but by default it still passes "--topo-order" to git-rev-list. Which turns git-rev-list totally non-incremental, and makes gitk horrible to start up with default arguments (ie none) on a huge repository. If it takes 1 minute to walk the whole history, then gitk will take a minute before it shows the first commit. Paulus was saying that it should be easily fixable, and that gitk *already* internally has a reorder buffer for commits out of topological order (for the "--date-order" thing, aka "gitk -d"), so gitk too should be able to stream perfectly well. And once you can stream, who cares how big the history is? The part that is old will take a long time, but people won't even see it, because they'll be busy looking at the new parts that they saw immediately. So this is why I tend to think that doing time <fundamental git operation> is actually not all that interesting. It's a *lot* more interesting in many cases to do time <fundamental git operation> | head because that gives a much more accurate view of what the user experience is like. To get back to the patch I sent out to "git blame", just to illustrate this issue: [torvalds@woody linux]$ time git blame --incremental -C block/ll_rw_blk.c > /dev/null real 0m8.540s user 0m8.109s sys 0m0.432s vs [torvalds@woody linux]$ time git blame --incremental -C block/ll_rw_blk.c | head > /dev/null real 0m0.238s user 0m0.240s sys 0m0.004s and 8.5 seconds is a _loong_ time even for a human, but 0.24 seconds is "instant". THAT is the difference between "streaming" and "non-streaming". For a similar example, and seeing why "topo-order" is problematic, just try this: [torvalds@woody linux]$ time git rev-list --all | head > /dev/null real 0m0.007s user 0m0.000s sys 0m0.012s vs [torvalds@woody linux]$ time git rev-list --topo-order --all | head > /dev/null real 0m1.058s user 0m1.028s sys 0m0.036s and note how they both just time the first few lines: one takes basically no time at all (it's fast *and* streaming) and the other one takes over a second (it gets the whole kernel history and then sorts it - so it can't stream. A second is still fast for "whole history", but the lack of streaming means that it's two orders of magnitude slower IN PRACTICE). So it's really the *second* case we want to avoid. We want to avoid teaching people bad manners, and here "bad manners" is not "having large repositories with lots of history", but simply means "do operations that fundamentally depend on all of history". This is why I would much prefer the "--incremental" blame. Suddenly, that turns "git blame" from a non-streaming (and thus fundamentally broken) operation into something that streams and can thus have a nice user experience. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html