Re: [RFC \ WISH] Add -o option to git-rev-list

Linus Torvalds <torvalds@xxxxxxxx> · Mon, 11 Dec 2006 10:15:27 -0800 (PST)

On Mon, 11 Dec 2006, Marco Costalba wrote:
> 
> Regarding the _normal_  solution we have one more hipotesys to take
> advantage of: git-rev-list when has nothing more to read..exits.

Yes. You can just wait for the child exit signal.

However, you seem to continually ignore the thing I've asked you to do 
several times: try with a cold-cache situation.

The thing is, using pipes and poll()/select()/epoll()/whatever will not 
only be efficient, but it will also WORK CORRECTLY in the presense of a 
writer that is _slower_ than the reader.

Right now, you're testing the exact opposite. You're testing the case 
where the reader is slower than the writer, which is actually not that 
interesting - because if "git-rev-list" is instantaneous, then the fastest 
thing to do is to simply _always_ just read the result buffer directly 
into memory without any select() loop etc what-so-ever.

So just by testing that case (and you've slowed down the reader 
artificially even _apart_ from the fact that qgit will probably always be 
slower than git-rev-list off a packed and hot-cache environment), you're 
always going to skew your results into the "do everything in one go" 
direction.

But the point of the pipe and the poll() is that it works nicely even when 
the data trickles in slowly.

[ Of course, as long as you ask for "--topo-order", you'll never see a lot 
  of trickling, and you'll never be able to do really well for the 
  cold-cache case. To see the _real_ advantage of pipes, you should void 
  "--topo-order" entirely, and do it dynamically within qgit, repainting 
  the graph as needed.

  At that point, you'd actually do something that gitk can't do at all, 
  namely work well for the cold-cache not-very-packed large-repository 
  case. ]

To see this in practice (even with hot-caches), do something like the 
following on the full historic Linux archive:

	time sh -c "git rev-list HEAD | head"
	time sh -c "git rev-list --topo-order HEAD | head"

where for me, the firstone takes 0.002s, and the second one takes 0.878s.

Now THAT is an optimization. Not just "10%" or even "ten times", but "four 
HUNDRED times" faster.

And why is that? Simply because it only needed to look at _part_ of the 
data. For the exact same reason, if you were to do the topological sort 
only on the part that you had _looked_ at first, you'd be able to do these 
kinds of several-orders-of-magnitude improvements.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html