Re: rev-list --parents --full-history + path: something's fishy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Sun, 25 May 2008, Johannes Sixt wrote:
> 
> The history was this:
> 
>    C--M
>   /  /
>  A--B
> 
> Now assume that both B and C change a, but so that it is identical in both B 
> and C. I thought that --full-history makes a difference *only* for this case, 
> because without --full-history the revision walk would choose either B or C 
> (not quite at random, but in an unspecified manner), but not both; but 
> with --full-history the revision walk would go both paths.

You mis-understood what --full history does.

It literally just means "walk all paths". The fact that *also* means that 
it often cannot simplify away merges at all is unfortunate, but there you 
have it.

In other words, it will now leave the merge alone, even if it can see that 
everything we care about (file 'a') came from just one side.

> This makes a difference in git-filter-branch --subdirectory-filter: We do want 
> to simplify history to those commits that touch a path, but we don't want to 
> simplify away the case outlined in the previous paragraph.

If so, you need to expand on the history simplification a *lot*. 

It currently does two things:

 - simplify away merges when it can see that one parent is identical in 
   content to the end result - it then removes the merge entirely, and 
   replaces it with the identical parent.

   This is the thing that would normally take the merge 'M', and just 
   replace it with 'B', because it sees that the contents all come from B. 
   But if 'C' _also_ changed the file, and 'M' was actually a data merge, 
   then it will be left alone (because the merge 'M' really is meaningful 
   as far as the data is concerned)

   This is what "--full-history" disables.

   So when you say "--full-history", all merges are left alone.

 - The *other* simplification is the non-merge case, where it simplifies 
   away commits that don't change the files. This is the one that then 
   removes 'C' in your original example.

   You can disable this simplification with "--sparse". Not that anybody 
   ever wants to.

What you seem to want in a *third* level of simplification, which is to 
remove merges that turn out to be pointless, because all parents end up 
being directly related. We don't do that simplification, and we never 
have.

I'd love to do it, but it's somewhat costly and very much more complicated 
than the simplifications we do do.

> What shall we do in git-filter-branch --subdirectory-filter?

See above. I can tell you _where_ you'd need to add the logic. See the 
file revision.c: remove_duplicate_parents(). You'd could "just" extend 
that to remove not just 100% duplicates, but also remove parents that are 
direct ancestors of each other. I say "just" in quotes, because that's not 
trivial to do efficiently.

But it gets worse. If that removal of parents then turns the commit into a 
regular one, *and* it didn't actually change the files you are interested 
in, you'd need to remove it entirely, which in turn means that you'd also 
need to rewrite the parent information of the children. Which means that 
simplify_commit() is actually too late, because that one happens when you 
print things out, and the children have already been returned (with what 
now is stale parenthood!).

So the _obvious_ place to do that simplification is actually too late. 

But doing it earlier is also hard, because that "simplify_commit()" is 
what removes the trivial linear ones.

So you'd actually need to add a whole new phase that removes the trivial 
linear cases *before* we are in the whole get_revision() phase, and then 
does the commit simplification. It's nasty and quite complicated. Which is 
why we don't do it.

The "revision.c" commit history simplification is already arguably some of 
the most complicated and subtle code in all of git. The code needs to be 
able to handle the "normal" case (which is to stream the commits without 
pre-computing the whole DAG) efficiently, but then there are those 
complicated cases that need the whole DAG. And they have to live 
side-by-side.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux