Re: Bizarre missing changes (git bug?)

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Tue, 29 Jul 2008 21:52:35 -0700 (PDT)

On Wed, 30 Jul 2008, Jeff King wrote:
> 
> I agree with you, btw. It is definitely correct and useful; however, I
> am curious if there is some "in between" level of simplification that
> might produce an alternate graph that has interesting features. And that
> is why I am trying to get Roman to lay out exactly what it is he wants.

Actually, I know what he wants, since I tried to describe it for the 
filter-branch discussion. It's really not that conceptually complex.

Basically, the stupid model is to just do this:

 - start with --full-history

 - for each merge, look at both parents. If one parent leads directly to 
   a commit that can be reached from the the other, just remove that 
   parent as being redundant. And if that removal leads to a merge now 
   becoming a non-merge, and it has no changes wrt its single remaining 
   parent, remove the commit entirely (rewriting any parenthood to make 
   the rest all stay together, of course)

 - repeat until you cannot do any more simplification (removing one commit 
   can actually cause its children to now become targets for this 
   simplification).

and I suspect that

 (a) the stupid model is probably at least O(n^3) if done stupidly and 
     O(n^2) with some modest amount of smarts (keeping a list of at least 
     potential targets of simplification and expanding it only when 
     actually simplifying), but that
 (b) you can concentrate on just the merges that the current optimizing 
     algorithm would have removed, so 'n' is not the total number of 
     commits, but at most the number of merges, and more likely actually 
     just the number of trivial merges in that file, and finally
 (c) there is likely some smart and efficient graph minimization algorithm 
     that is O(nlogn) or something.

so I don't think it's likely to be hugely more expensive than the 
topo-sort is. All the real expense is in the same thing the topo-sort 
expense, namely in generating the list up-front.

I bet googling for "minimal directed acyclic graph" will give pointers.

And despite the fact that I've argued against Roman's world-view, I 
actually _do_ think it would be nice to have that third mode, the same way 
that we have --topo-order. It wouldn't be good for the _default_ view, but 
then neither is --full-history, so that's not a big argument.

That said, I'd like to (again) repeat the caveat that it's probably best 
done in the tool that actally visualizes the mess - exactly for the same 
reason that I argued for the topological sort being done in gitk. It's 
very painful to have to wait for the first few commits to start appearing 
in the history window.

Admittedly most of my work is actually done on machines that are pretty 
fast, but every once in a while I travel with a laptop. And more 
importantly, not everybody gets new hardware from Intel for testing even 
before the CPU has been released. So others will still appreciate 
incremental history updates, even if my machine might be fast enough (and 
my kernel tree always in the caches) that I myself could live with a 
synchronous version a-la --topo-order.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html