On Sun, 6 May 2007, Karl Hasselstr?m wrote: > > OK, now I've tested it, and just as you said, it works (and is _very_ > useful) but looks like crap. :-) > > Is there any fundamental reason why > > gitk -- some/path/name > > generates a nice, connected graph, while > > gitk -S'some string' > > generates disconnected spaghetti? There is a reason, and it's fairly fundamental: the path limiting code is deeply embedded in the revision walking, and I've spent a fair amount of effort on making that work and efficient as hell (it's one of the few areas in git where I'm probably still the main author). Because it's literally what I do 90% of the time: for me, the path-limiting code is basically _the_ most important git feature, and I care very deeply. In contrast, the "-S" thing is not actually part of the revision walking at all, and is a totally separate phase that is done when revisions are _shown_. I almost never use it myself, and it grew out of a totally separate effort by Junio. > Or could the latter be made to use the same parent-rewriting logic as > the first? It would probably be possible to make the -S logic be another part of the "prune_fn()" logic in revision.c, and it might even simplify some of the logic, but I suspect it would actually suck really really badly from a performance standpoint. Why? Because the prune_fn() logic is done when we generate the revision graph, which is generally something that a lot of the operations have to do up-front before they can do _anything_ else. Eg, any revision limiter (and that's a very common case) like "v2.6.21.." will cause the revision pruning to happen synchronously and early on. And the path-limiting is *fast*. It's so incredibly fast that people don't really realize how fast it is. And it absolutely needs to be fast, because when you do something like "gitk v2.6.18.. drivers/" on the kernel you end up doing a _lot_ of tree comparisons. It's why I'm pretty sure nobody else can ever do what git does - it takes full advantage of how git can tell that a whole subdirectory hasn't changed without even recursing into it. In contrast, "-S" is _slow_. It's a really really expensive operation. Git makes generating diffs faster than just about anything else, but it's still really expensive. This is a really unfair comparison, but: time git log drivers/net/ > /dev/null real 0m1.488s user 0m1.444s sys 0m0.040s ie we can do the log pruning for the whole kernel git history on a subdirectory in less than two seconds. Try to compare it with time git log -Sdrivers/net/ > /dev/null and I suspect you won't have the patience to wait for the end result. And yeah, the operations are fundamentally very very different, and yes, the latter operation is really really expensive (which is why I said it's a really unfair comparison). But the point is that the expense comes from how git has been designed: seeing differences in the paths is cheap by design (it's how the data structures are laid out), but seeing differences in actual diffs means that we have to fully generate each diff for each revision! A different approach to the underlying datastructures could change the equation. For example, if the fundamental data representation was the "diff" (rather than the "whole tree") maybe -S would be as fast as path limiting. But you'd *really* suck for other things. To summarize a long story: the path limiting is simply more fundamental in git. Both by design, and then - obviously partly _due_ to that - by pure effort we've spent on it. It's something very deep and very important. In comparison, the -S thing is a cute extra feature, nothing really "deep". Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html