[PATCH/BAD 4/4] subtree: poor bugfix for split new commits with parents before previous split

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Bug description: Unless you use --ignore-joins, "git subtree split"'s
optimization to avoid re-scanning all of history can trim too much.
Any new merged branches that have parents before the previous "split"
will not be re-attached properly in the split-off subtree.
In the extreme case (if all the changes occurred on such
branches), I've seen the old subtree head might not be an
ancestor of the new head at all.

This "fix" is only for illustration.  It probably should not be
included as-is; it probably scales worse than using --ignore-joins
for anything much beyond trivial cases.

I'm not sure it is possible to speed it up much (while remaining
correct) without switching to a better language than shell script.

I'm not sure how to explain the constraint clearly (or even
precisely define the "minimal" constraint).  One attempt:
Exclude from "unrevs" any rejoin commit X for which ANY new commit
(not just revs itself) has some other common ancestor (merge base)
besides X.  ("New commit" is one that is not an ancestor of
some previous rejoin commit.)

It would probably be fairly straightforward to adapt
graph traversal algorithms to do this efficiently, but as near
as I can tell, none of the existing options in git-rev-list or
git-merge-base really does anything useful for optimizing this
in shell script...

The simplest traversal technique to fix this might
be if there was a way to make history traversal only stop
at SPECIFIC unrevs, instead of any ancestor of any unrevs.

Other workarounds besides this patch:
   * Use --ignore-joins and wait for the really slow processing...
   * Plan ahead and delay: After using git subtree split --rejoin,
     leave the rejoin merge commit off on a side branch until such time
     that all other pre-split branches have been merged.  Then
     "merge the merge" at that later time.  Only do rejoins fairly
     rarely, based on the schedule of merging other branches.
   * Ignore the problem: Allow the occasional falsely-disconnected
     branch roots to be generated by split.  Of course, this
     means --ignore-joins would no longer generate a consistent
     subtree history, it is hard to examine what actually changed
     in the false roots, etc.
   * Perhaps manually use temporary grafts or similar to hide rejoins
     that would throw it off when doing later splits.

[Intentionally not signed off: Poor performance.]
---

Testing: Below I include a script I've been using to help with
mostly-manual testing.  I regularly modify it based on
what I'm testing at the moment.  It basically
creates and manipulates a subtree out of git's own "contrib"
directory.

It may be convenient to use this on a copy of git's
repository instead of the copy in which you are messing with
git-subtree (avoiding changing code out from under the script).
It may also be convenient to symlink git-subtree to the top-level
directory, rather than copy it, so you don't need to keep
re-copying it.

------CUT------
#!/bin/sh

die()
{ echo "$@" 1>&2
  exit 1
}

GIT=../git-sandbox/bin-wrappers/git
#GIT=git

DBG=
#DBG=-d

EDIT=
#EDIT=--edit

git branch -D br-contrib

git checkout 73bbc0796b4ce65bfb1a12b47a0edc27845ecf50 || die "checkout"

"$GIT" subtree split -P contrib \
         --branch br-contrib --squash --rejoin $EDIT || die "subtree 1"

git tag -f step1

git merge -m 'MY MERGE1' 68a65f5fe54c2b21bfe16ef3a0b48956ecf5658a ||
        die "merge 1"

"$GIT" subtree split -P contrib \
         --branch br-contrib --squash --rejoin || die "subtree 2"

git tag -f step2

git merge -m 'MY MERGE2' 15f7221686eac053902b906c278680b485c865ce || \
        die "merge 2"

"$GIT" subtree split -P contrib \
         --branch br-contrib --squash --rejoin $EDIT || die "subtree 3"

#####
if [ -n "$EDIT" ]; then
    exit 0
fi

git tag -f step3

git merge -m 'MY MERGE3' 26145c9c73ed51bbd8261949d31899d6507519d5 || \
        die "merge 3"

"$GIT" subtree split -P contrib \
         $DBG --branch br-contrib --squash --rejoin || die "subtree 4"

git tag -f step4

git merge -m 'MY MERGE4' 583736c0bcf09adaa5621b142d8e43c22354041b || \
        die "merge 4"

"$GIT" subtree split -P contrib \
         --branch br-contrib --squash --rejoin || die "subtree 5"

git tag -f step5
------CUT------



 contrib/subtree/git-subtree.sh | 61 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 55 insertions(+), 6 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index ac82b4d..6ff4362 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -36,6 +36,8 @@ PATH=$PATH:$(git --exec-path)
 
 require_work_tree
 
+nl='
+'
 quiet=
 branch=
 debug=
@@ -224,6 +226,13 @@ try_remove_previous()
 	fi
 }
 
+try_remove()
+{
+	if rev_exists "$1"; then
+		echo "$1"
+	fi
+}
+
 find_latest_squash()
 {
 	debug "Looking for latest squash ($dir)..."
@@ -293,8 +302,8 @@ find_existing_splits()
 					debug "  Prior: $main -> $sub"
 					cache_set $main $sub
 					cache_set $sub $sub
-					try_remove_previous "$main"
-					try_remove_previous "$sub"
+					try_remove "$main"
+					try_remove "$sub"
 				fi
 				main=
 				sub=
@@ -303,6 +312,40 @@ find_existing_splits()
 	done
 }
 
+filter_out_needed_existing()
+{
+	revs="$1"
+	existing="$2"
+	base=
+	git rev-list --merges --parents $revs --not $existing | \
+		sed -e 's/^[^ ]* //' -e "y/ /\\$nl/" | sort | uniq | \
+	{
+		debug "filter_out_needed_existing"
+		debug "  revs: $revs"
+		debug "  existing: $existing"
+		while read needed; do
+			debug "loop: $needed"
+			nextexisting=
+			for exist in $existing ; do
+				tmp="$(git merge-base $exist $needed)"
+				if [ x"$tmp" = x"$exist" -o -z "$tmp" ]; then
+					nextexisting="$nextexisting $exist"
+				fi
+			done
+			existing=$nextexisting
+		done
+		echo "$existing"
+	}
+}
+
+existing_to_unrevs()
+{
+	existing="$1"
+	for exist in $existing ; do
+		try_remove_previous $exist
+	done
+}
+
 copy_commit()
 {
 	# We're going to set some environment vars here, so
@@ -599,10 +642,16 @@ cmd_split()
 		done
 	fi
 	
-	if [ -n "$ignore_joins" ]; then
-		unrevs=
-	else
-		unrevs="$(find_existing_splits "$dir" "$revs")"
+	unrevs=
+	if [ -z "$ignore_joins" ]; then
+		existing="$(find_existing_splits "$dir" "$revs")"
+		if [ -n "$existing" ]; then
+			debug "existing: $existing"
+			existing="$(filter_out_needed_existing "$revs" "$existing")"
+			unrevs="$(existing_to_unrevs "$existing")"
+			debug "base: $base"
+			debug "unrevs: $unrevs"
+		fi
 	fi
 	
 	# We can't restrict rev-list to only $dir here, because some of our
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]