Re: More precise tag following

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Sat, 27 Jan 2007 12:13:08 -0800 (PST)

On Sat, 27 Jan 2007, Jakub Narebski wrote:
> 
> By the way, in git-blame you can also give the cutoff like in git-log;
> the lines which come from outside given revision range either get blamed
> on boundary, or are shown "unblamed".

Well, that's not actually all that useful. It's ok when nothing else 
works, but it would be nicer if it just acted well "by default".

Because You generally don't know a priori where your point of interest 
lies.

This is why "git log -p" (or "git whatchanged" before it) is so nice. A 
streaming format means that you get the stuff you likely care about soon, 
but if you aren't quite sure about where it was, it will come _eventually_ 
as you page down. And you can decide at any point in the middle that "ok, 
the thing I was looking for is obviously ancient", which may end up 
changing your whole outlook on a problem.

Which is why I think "incremental" things are so important.

In Sydney at Linux.conf.au I talked a bit to Paul Mackerras about gitk, 
and gitk is _fairly_ good at doing things incrementally (and apparently it 
is internally better at it than I have realized), but by default it still 
passes "--topo-order" to git-rev-list.

Which turns git-rev-list totally non-incremental, and makes gitk horrible 
to start up with default arguments (ie none) on a huge repository. If it 
takes 1 minute to walk the whole history, then gitk will take a minute 
before it shows the first commit.

Paulus was saying that it should be easily fixable, and that gitk 
*already* internally has a reorder buffer for commits out of topological 
order (for the "--date-order" thing, aka "gitk -d"), so gitk too should be 
able to stream perfectly well.

And once you can stream, who cares how big the history is? The part that 
is old will take a long time, but people won't even see it, because 
they'll be busy looking at the new parts that they saw immediately.

So this is why I tend to think that doing

	time <fundamental git operation>

is actually not all that interesting. It's a *lot* more interesting in 
many cases to do

	time <fundamental git operation> | head

because that gives a much more accurate view of what the user experience 
is like.

To get back to the patch I sent out to "git blame", just to illustrate 
this issue:

	[torvalds@woody linux]$ time git blame --incremental -C block/ll_rw_blk.c > /dev/null
	real    0m8.540s
	user    0m8.109s
	sys     0m0.432s

vs

	[torvalds@woody linux]$ time git blame --incremental -C block/ll_rw_blk.c | head > /dev/null
	real    0m0.238s
	user    0m0.240s
	sys     0m0.004s

and 8.5 seconds is a _loong_ time even for a human, but 0.24 seconds is 
"instant". THAT is the difference between "streaming" and "non-streaming".

For a similar example, and seeing why "topo-order" is problematic, just 
try this:

	[torvalds@woody linux]$ time git rev-list --all | head > /dev/null
	real    0m0.007s
	user    0m0.000s
	sys     0m0.012s

vs

	[torvalds@woody linux]$ time git rev-list --topo-order --all | head > /dev/null
	real    0m1.058s
	user    0m1.028s
	sys     0m0.036s

and note how they both just time the first few lines: one takes basically 
no time at all (it's fast *and* streaming) and the other one takes over a 
second (it gets the whole kernel history and then sorts it - so it can't 
stream. A second is still fast for "whole history", but the lack of 
streaming means that it's two orders of magnitude slower IN PRACTICE).

So it's really the *second* case we want to avoid. We want to avoid 
teaching people bad manners, and here "bad manners" is not "having large 
repositories with lots of history", but simply means "do operations that 
fundamentally depend on all of history".

This is why I would much prefer the "--incremental" blame. Suddenly, that 
turns "git blame" from a non-streaming (and thus fundamentally broken) 
operation into something that streams and can thus have a nice user 
experience.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html