Re: [RFH] filter-branch: ancestor detection weirdness

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Johannes Schindelin wrote:
> 
> On Fri, 8 Aug 2008, Thomas Rast wrote:
> 
> > diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> > index 182822a..52b2bdf 100755
> > --- a/git-filter-branch.sh
> > +++ b/git-filter-branch.sh
> > @@ -325,15 +325,9 @@ while read ref
> >  do
> >  	sha1=$(git rev-parse "$ref"^0)
> >  	test -f "$workdir"/../map/$sha1 && continue
> > -	# Assign the boundarie(s) in the set of rewritten commits
> > -	# as the replacement commit(s).
> > -	# (This would look a bit nicer if --not --stdin worked.)
> > -	for p in $( (cd "$workdir"/../map; ls | sed "s/^/^/") |
> > -		git rev-list $ref --boundary --stdin |
> > -		sed -n "s/^-//p")
> > -	do
> > -		map $p >> "$workdir"/../map/$sha1
> > -	done
> > +	# Assign the first commit not pruned as the replacement.
> > +	candidate=$(git rev-list $ref -1 -- "$filter_subdir")

I think I see the actual problem.  I made a small testing repository
with history that looks like this:

*   a6f2213... (refs/heads/master) Merge branch 'side'
|\
| * 311f888... (refs/heads/side) outside
| * 472893d... inside dir
* | 9bd52bc... (refs/heads/stale) outside
* | d1b451a... inside dir
|/
* 1c48eea... initial

It is available at

  git://persephone.dnsalias.net/git/filtertest.git

if you want to try.  All commits labelled 'inside dir' do something in
dir/; the others don't.  (You can disregard the 'other' branch for
now; I wanted to test the behaviour on completely disconnected history
too, since that's the case with Jan's repo.)

Let's depict this as the following for now, where capitals stand for
"interesting" commits under the subdirectory filter:

   i -- A -- b(stale) -- M(master)
    \                   /
     \- C -- d(side) --/

When saying

  $ git filter-branch --subdirectory-filter dir -- --all'

I would expect the history to look like:

   A(stale) -- M(master)
              /
   C(side) --/

I think treating it this way makes a lot of sense; you get the last
state that your subdirectory had on the corresponding branch or tag.
(Similarly, a leaf branch that does not affect 'dir' should be backed
up until it hits an ancestor that survives the filter.)

Now the problem with the above ancestor detection is the following.
Consider that at this point, the 'map' directory contains the
(unfiltered) SHA1 for every commit that was rewritten during the
filtering process, i.e.

  $ g rev-list --all -- dir | git name-rev --stdin
  093c591b3d751ce778b4a6e5c2a0906b097b5868 (other~1)
  a6f22134f8ab8bcc762949df53f674e3410f7fc3 (master)
  d1b451a4b0657ea894fd772fc609f7863b7dfd15 (stale~1)
  472893d579383f56f006ff42c563dcbb730bc5b8 (side~1)

So 'map' has the values for M, A, and C.  Now if you expand the call

  (cd "$workdir"/../map; ls | sed "s/^/^/") |
          git rev-list $ref --boundary --stdin

you'll find that during ref=refs/heads/side, it is equivalent to

  $ git rev-list side --boundary ^master ^side~1 ^stale~1 ^other~1
  [no output!]

Oops, it seems that wasn't what we wanted.  The '^master', which
reaches 'side' already, precludes all output.

So now that I've finally understood what is going on, I think a more
careful use of rev-list -1 is actually a correct and easy way to
figure out an ancestor.  Patch follows.

- Thomas

-- 
Thomas Rast
trast@xxxxxxxxxxxxxxx

Attachment: signature.asc
Description: This is a digitally signed message part.


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux