On Sun, 25 May 2008, Johannes Sixt wrote: > > The history was this: > > C--M > / / > A--B > > Now assume that both B and C change a, but so that it is identical in both B > and C. I thought that --full-history makes a difference *only* for this case, > because without --full-history the revision walk would choose either B or C > (not quite at random, but in an unspecified manner), but not both; but > with --full-history the revision walk would go both paths. You mis-understood what --full history does. It literally just means "walk all paths". The fact that *also* means that it often cannot simplify away merges at all is unfortunate, but there you have it. In other words, it will now leave the merge alone, even if it can see that everything we care about (file 'a') came from just one side. > This makes a difference in git-filter-branch --subdirectory-filter: We do want > to simplify history to those commits that touch a path, but we don't want to > simplify away the case outlined in the previous paragraph. If so, you need to expand on the history simplification a *lot*. It currently does two things: - simplify away merges when it can see that one parent is identical in content to the end result - it then removes the merge entirely, and replaces it with the identical parent. This is the thing that would normally take the merge 'M', and just replace it with 'B', because it sees that the contents all come from B. But if 'C' _also_ changed the file, and 'M' was actually a data merge, then it will be left alone (because the merge 'M' really is meaningful as far as the data is concerned) This is what "--full-history" disables. So when you say "--full-history", all merges are left alone. - The *other* simplification is the non-merge case, where it simplifies away commits that don't change the files. This is the one that then removes 'C' in your original example. You can disable this simplification with "--sparse". Not that anybody ever wants to. What you seem to want in a *third* level of simplification, which is to remove merges that turn out to be pointless, because all parents end up being directly related. We don't do that simplification, and we never have. I'd love to do it, but it's somewhat costly and very much more complicated than the simplifications we do do. > What shall we do in git-filter-branch --subdirectory-filter? See above. I can tell you _where_ you'd need to add the logic. See the file revision.c: remove_duplicate_parents(). You'd could "just" extend that to remove not just 100% duplicates, but also remove parents that are direct ancestors of each other. I say "just" in quotes, because that's not trivial to do efficiently. But it gets worse. If that removal of parents then turns the commit into a regular one, *and* it didn't actually change the files you are interested in, you'd need to remove it entirely, which in turn means that you'd also need to rewrite the parent information of the children. Which means that simplify_commit() is actually too late, because that one happens when you print things out, and the children have already been returned (with what now is stale parenthood!). So the _obvious_ place to do that simplification is actually too late. But doing it earlier is also hard, because that "simplify_commit()" is what removes the trivial linear ones. So you'd actually need to add a whole new phase that removes the trivial linear cases *before* we are in the whole get_revision() phase, and then does the commit simplification. It's nasty and quite complicated. Which is why we don't do it. The "revision.c" commit history simplification is already arguably some of the most complicated and subtle code in all of git. The code needs to be able to handle the "normal" case (which is to stream the commits without pre-computing the whole DAG) efficiently, but then there are those complicated cases that need the whole DAG. And they have to live side-by-side. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html