Re: [PATCH] Documentation/git-filter-branch.txt: Fix description of --commit-filter

Kevin Ballard <kevin@xxxxxx> · Fri, 30 May 2008 16:07:38 -0700

On May 30, 2008, at 3:52 PM, Junio C Hamano wrote:

Kevin Ballard <kevin@xxxxxx> writes:

The old description was misleading and logically impossible. It  
claimed that
the ancestors of the original commit would be re-written to have  
the multiple
emitted ids as parents. Not only would this modify existing  
objects, but it
would create a cycle. What this actually does is pass the multiple  
emitted ids
to the newly-created children to use as parents.

Signed-off-by: Kevin Ballard <kevin@xxxxxx>
---
Documentation/git-filter-branch.txt |    4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-filter-branch.txt b/Documentation/ 
git-filter-branch.txt
index 506c37a..541bf23 100644
--- a/Documentation/git-filter-branch.txt
+++ b/Documentation/git-filter-branch.txt
@@ -113,8 +113,8 @@ OPTIONS
	stdin.  The commit id is expected on stdout.
+
As a special extension, the commit filter may emit multiple
-commit ids; in that case, ancestors of the original commit will
-have all of them as parents.
+commit ids; in that case, the rewritten children of the original  
commit will
+have all of them as parents. You probably don't want to do this.
+

Now I am _very_ confused.

The original description sounds as if:

       In this history, when rewriting commit C, if we emit A from the
       filter:

                    B
                     \
               ---A---C---D

       We will somehow make 'A' and 'B' have A as their parents.

which is wrong as you pointed out.

But I am also confused by the new description:

       In that history, we will make sure that rewritten D (original
       commit being C) have A as parent.  IOW, we will have

               --A'--C'  D'
                        /
                       A

which is not what happens.  What it does is that the commits in the  
output
from the filter (i.e. A) are first mapped to the corresponding  
commits in
the rewritten history (i.e. A'), and they will be used as the  
parents of
the rewritten commit, to form this history:

               --A'--C'

isn't it?

So basically, you think it's missing the fact that the emitted id is  
mapped to rewritten commits? From reading the git-filter-branch code,  
I don't think that's correct. When each commit is created, its  
original parents get mapped to new values, but the results of the  
commit-filter are dumped straight into the map.

To give an example, let's examine your tree. A will be processed  
first, and A' gets put into the map for A. B gets processed next (or  
maybe before A, but that's irrelevant) and B' gets put into the map  
for B. C gets processed, and it emits A, so A goes into the map for C.  
Then D is processed, and its original parent C is looked up in the map  
and A is returned. So, as near as I can tell, that "broken" history is  
exactly what you'll get if the commit-filter returns A for C. This  
means that when you're writing a commit-filter for this, you probably  
want to emit $(map A), not A.

Perhaps the description should be significantly expanded to include  
the diagrams and explanations?

Also you did not defend why you added "You probably don't want to do  
this"
to the description.


Because when the commit-filter emits multiple ids, it's converting the  
child commits into merges without even knowing what the child commits  
will be. I was just trying to warn people away from using this feature  
unless they know exactly what they're doing. Usually you want to use a  
parent-filter if you're converting commits into merges, because that  
way you know exactly what commits you're modifying.

-Kevin

--
Kevin Ballard
http://kevin.sb.org
kevin@xxxxxx
http://www.tildesoft.com


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html