Re: Change set based shallow clone

Linus Torvalds <torvalds@xxxxxxxx> · Fri, 8 Sep 2006 19:56:00 -0700 (PDT)

On Sat, 9 Sep 2006, Paul Mackerras wrote:
> 
> It might get nasty if we have laid out A and then later B, and then C
> comes along and turns out to be a child of A but a parent of B,
> meaning that both B and C have to be put above A.

Right. This is why I would suggest just recomputing the thing entirely, 
instead of trying to make it incremental. It would definitely cause 
re-organization of the tree when you find a new relationship between two 
old commits that basically creates a new orderign between them.

Trying to do that incrementally sounds really really hard. But just 
_detecting_ the situation that you are getting a new commit that has a 
parent that you have already shown (and that thus must go _before_ one of 
the things you've shown already, and implies a new line reacing it) is 
very easy. And then re-generating the graph might not be too bad.

Running "gitk" on the kernel with a hot cache and fully packed (and enough 
memory) isn't too bad, because git rev-list literally takes under a second 
for that case, so the advantage of avoiding --topo-order isn't _that_ 
noticeable. 

And the "fully packed" part is probably the most important part. A packed 
Linux historic tree takes just under six seconds cold-cache and under two 
seconds hot-cache, but that's because pack-files are _really_ good at 
mapping all the commits in just one go, and at the beginning of the 
pack-file.

But try the same thing with a fully unpacked kernel, and you'll see the 
real pain of having to traverse all of history. We're talking minutes, 
even when hot in the cache.

> Another thing I have been thinking of is that gitk probably should
> impose a time limit of say 3 months by default

I don't think time is good, because for an old project (like the Linux 
historic repo), three months is literally nothing. So it would be much 
better to go simply by numbers, but sadly, that won't help - simply 
because if we want to know the 100 first commits in --topo-order, we'll do 
the whole history and sort it _first_, and then do the "pick top 100 
commits" only after that.

(Which may not be sensible, of course, but it's what we do.. It's almost 
impossible to do it the other way, because you won't know until the point 
where you do "get_revision()" if you are _actually_ going to use a commit 
or not, so counting them before the end is fundamentally very hard).

> Together with a menu to select the time limit, I think that would be 
> quite usable and would make gitk start up *much* faster.

The menu would help, of course. But it would be even nicer if you'd be 
able to make do without the --topo-order.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html