Hi all, I'm working on a small perl script that needs to reproduce the results of the default git history simplification used by log & rev-list when a list of paths is given. It is important that I reproduce its results exactly. I would love to simply use the rev-list built-in to do my work for me; but I fear that I may have much too many path limiters than the linux command-line can handle (which if I'm correct, can only take so many arguments). I can understand whether a commit is included or not rather easily: we basically only include a commit iff the commit introduces a change to the given path (ie, it is !TREESAME to any of its parents). For merges, I get a little confused. In the simple case of a 2-parent merge, if we find that we are TREESAME with one parent -- it follows that the interesting change must have come from that parent. So we follow it, and can safely assume that the other parent is uninteresting. If we find that we are TREESAME with both parents, it follows that neither parent nor the merge commit is interesting. Therefore, we randomly pick the first parent and move up the graph looking for an interesting commit. Looking at this more closely, the parents we ignored will fall under three scenarios: 1. It finds it way all the way up to a root commit without finding an interesting commit -- which can happen if your repository has multiple root commits. 2. It will move up the graph and eventually find a common ancestor with the first parent. 3. It will find an interesting commit that has an identical/equivalent commit somewhere in the first parent's ancestry. In all 3 cases, it is safe to follow just the first TREESAME parent. It is interesting to note that if are working with a repository with multiple root commits (say, you imported a project without history, and later merged in its upstream history), you could potentially lose that history with this algorithm if you happened to merge it as the second parent. This is not necessarily a flaw (in fact, it may be a feature!), since you have still fully explained the state of your HEAD with respect to the first parent (which is probably your mainline). In the case that neither parent was TREESAME, we can have 2 scenarios: 1. The merge commit itself made the interesting change. 2. The interesting changes came from both parents, in which we would rightfully follow both parents. My first question is this: If the merge commit itself (D) made the interesting change, we could potentially follow an uninteresting parent -- most likely all the way up until we find a common ancestor (A). This would produce a graph that looks like: A---B---D \-------/ (C pruned) Or, if both parents were uninteresting: A---D (both B & C pruned) \---/ I assume that the simplification takes care of this by removing duplicate parents as well as searching for common ancestors? (It is not mentioned in the documentation). My second question is then: Given that we perform an extra simplification pass to remove common ancestors and duplicate parents, what is the purpose of selectively following parents? Is it just for speed? If we always followed all parents and applied our duplicate simplification and common ancestor simplification, we would not always arrive at the same result? In which case, if my application is not concerned with speed (ie, I don't mind following all parents), would I always arrive at the same graph? Thanks, Tommy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html