Many projects are made of a combination of several subprojects/libraries and some application-specific code. In some cases, particularly when the subprojects are all maintained independently, 'git submodule' is the best way to deal with this situation. But if you frequently change the subprojects as part of developing your application, use multiple branches, and sometimes want to push your subproject changes upstream, the overhead of manually managing submodules can be excessive. 'git subtree' provides an alternative mechanism, based around the 'git merge -s subtree' merge strategy. Instead of tracking a submodule separately, you merge its history into your main project, and occasionally extract a new "virtual history" from your mainline that can be easily merged back into the upstream project. The virtual history can be incrementally expanded as you make more changes to the superproject. You would normally then merge the virtual history back into your mainline (the --rejoin option). This results in extra commits in your application that appear to change the same files, but these extra commits will tend to be ignored by git's merge simplification algorithm anyway. For example, gitweb (commit 1130ef3) was merged into git as of commit 0a8f4f0, after which it was no longer maintained separately. But imagine it had been maintained separately, and we wanted to extract git's changes to gitweb since that time, to share with the upstream. You could do this: git subtree split --prefix=gitweb --annotate='(split) ' \ 0a8f4f0^.. --onto=1130ef3 --rejoin If gitweb had originally been merged using 'git subtree add' (or a previous split had been done with --rejoin specified), then you could incrementally produce the list of new changes without needing to remember any commit ids: git subtree split --prefix=gitweb --annotate='(split) ' --rejoin --- Resending just in case it got lost. Don't suppose anyone has any comments on this? It's just a first draft, so please let loose. I'm in desperate need of *some* kind of solution like this, however. - Avery Makefile | 1 + command-list.txt | 1 + git-subtree.sh | 435 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 437 insertions(+), 0 deletions(-) create mode 100755 git-subtree.sh diff --git a/Makefile b/Makefile index 5c8e83a..f14e11c 100644 --- a/Makefile +++ b/Makefile @@ -305,6 +305,7 @@ SCRIPT_SH += git-request-pull.sh SCRIPT_SH += git-sh-setup.sh SCRIPT_SH += git-stash.sh SCRIPT_SH += git-submodule.sh +SCRIPT_SH += git-subtree.sh SCRIPT_SH += git-web--browse.sh SCRIPT_PERL += git-add--interactive.perl diff --git a/command-list.txt b/command-list.txt index fb03a2e..9be4774 100644 --- a/command-list.txt +++ b/command-list.txt @@ -113,6 +113,7 @@ git-stash mainporcelain git-status mainporcelain common git-stripspace purehelpers git-submodule mainporcelain +git-subtree mainporcelain git-svn foreignscminterface git-symbolic-ref plumbingmanipulators git-tag mainporcelain common diff --git a/git-subtree.sh b/git-subtree.sh new file mode 100755 index 0000000..39c377c --- /dev/null +++ b/git-subtree.sh @@ -0,0 +1,435 @@ +#!/bin/bash +# +# git-subtree.sh: split/join git repositories in subdirectories of this one +# +# Copyright (C) 2009 Avery Pennarun <apenwarr@xxxxxxxxx> +# +if [ $# -eq 0 ]; then + set -- -h +fi +OPTS_SPEC="\ +git subtree add --prefix=<prefix> <commit> +git subtree split [options...] --prefix=<prefix> <commit...> +git subtree merge --prefix=<prefix> <commit> +git subtree pull --prefix=<prefix> <repository> <refspec...> +-- +h,help show the help +q quiet +d show debug messages +prefix= the name of the subdir to split out + options for 'split' +annotate= add a prefix to commit message of new commits +onto= try connecting new tree to an existing one +rejoin merge the new branch back into HEAD +ignore-joins ignore prior --rejoin commits +" +eval $(echo "$OPTS_SPEC" | git rev-parse --parseopt -- "$@" || echo exit $?) +. git-sh-setup +require_work_tree + +quiet= +debug= +command= +onto= +rejoin= +ignore_joins= +annotate= + +debug() +{ + if [ -n "$debug" ]; then + echo "$@" >&2 + fi +} + +say() +{ + if [ -z "$quiet" ]; then + echo "$@" >&2 + fi +} + +assert() +{ + if "$@"; then + : + else + die "assertion failed: " "$@" + fi +} + + +#echo "Options: $*" + +while [ $# -gt 0 ]; do + opt="$1" + shift + case "$opt" in + -q) quiet=1 ;; + -d) debug=1 ;; + --annotate) annotate="$1"; shift ;; + --no-annotate) annotate= ;; + --prefix) prefix="$1"; shift ;; + --no-prefix) prefix= ;; + --onto) onto="$1"; shift ;; + --no-onto) onto= ;; + --rejoin) rejoin=1 ;; + --no-rejoin) rejoin= ;; + --ignore-joins) ignore_joins=1 ;; + --no-ignore-joins) ignore_joins= ;; + --) break ;; + esac +done + +command="$1" +shift +case "$command" in + add|merge|pull) default= ;; + split) default="--default HEAD" ;; + *) die "Unknown command '$command'" ;; +esac + +if [ -z "$prefix" ]; then + die "You must provide the --prefix option." +fi +dir="$prefix" + +if [ "$command" != "pull" ]; then + revs=$(git rev-parse $default --revs-only "$@") || exit $? + dirs="$(git rev-parse --no-revs --no-flags "$@")" || exit $? + if [ -n "$dirs" ]; then + die "Error: Use --prefix instead of bare filenames." + fi +fi + +debug "command: {$command}" +debug "quiet: {$quiet}" +debug "revs: {$revs}" +debug "dir: {$dir}" +debug "opts: {$*}" +debug + +cache_setup() +{ + cachedir="$GIT_DIR/subtree-cache/$$" + rm -rf "$cachedir" || die "Can't delete old cachedir: $cachedir" + mkdir -p "$cachedir" || die "Can't create new cachedir: $cachedir" + debug "Using cachedir: $cachedir" >&2 +} + +cache_get() +{ + for oldrev in $*; do + if [ -r "$cachedir/$oldrev" ]; then + read newrev <"$cachedir/$oldrev" + echo $newrev + fi + done +} + +cache_set() +{ + oldrev="$1" + newrev="$2" + if [ "$oldrev" != "latest_old" \ + -a "$oldrev" != "latest_new" \ + -a -e "$cachedir/$oldrev" ]; then + die "cache for $oldrev already exists!" + fi + echo "$newrev" >"$cachedir/$oldrev" +} + +# if a commit doesn't have a parent, this might not work. But we only want +# to remove the parent from the rev-list, and since it doesn't exist, it won't +# be there anyway, so do nothing in that case. +try_remove_previous() +{ + if git rev-parse "$1^" >/dev/null 2>&1; then + echo "^$1^" + fi +} + +find_existing_splits() +{ + debug "Looking for prior splits..." + dir="$1" + revs="$2" + git log --grep="^git-subtree-dir: $dir\$" \ + --pretty=format:'%s%n%n%b%nEND' $revs | + while read a b junk; do + case "$a" in + git-subtree-mainline:) main="$b" ;; + git-subtree-split:) sub="$b" ;; + *) + if [ -n "$main" -a -n "$sub" ]; then + debug " Prior: $main -> $sub" + cache_set $main $sub + try_remove_previous "$main" + try_remove_previous "$sub" + main= + sub= + fi + ;; + esac + done +} + +copy_commit() +{ + # We're doing to set some environment vars here, so + # do it in a subshell to get rid of them safely later + debug copy_commit "{$1}" "{$2}" "{$3}" + git log -1 --pretty=format:'%an%n%ae%n%ad%n%cn%n%ce%n%cd%n%s%n%n%b' "$1" | + ( + read GIT_AUTHOR_NAME + read GIT_AUTHOR_EMAIL + read GIT_AUTHOR_DATE + read GIT_COMMITTER_NAME + read GIT_COMMITTER_EMAIL + read GIT_COMMITTER_DATE + export GIT_AUTHOR_NAME \ + GIT_AUTHOR_EMAIL \ + GIT_AUTHOR_DATE \ + GIT_COMMITTER_NAME \ + GIT_COMMITTER_EMAIL \ + GIT_COMMITTER_DATE + (echo -n "$annotate"; cat ) | + git commit-tree "$2" $3 # reads the rest of stdin + ) || die "Can't copy commit $1" +} + +add_msg() +{ + dir="$1" + latest_old="$2" + latest_new="$3" + cat <<-EOF + Add '$dir/' from commit '$latest_new' + + git-subtree-dir: $dir + git-subtree-mainline: $latest_old + git-subtree-split: $latest_new + EOF +} + +merge_msg() +{ + dir="$1" + latest_old="$2" + latest_new="$3" + cat <<-EOF + Split '$dir/' into commit '$latest_new' + + git-subtree-dir: $dir + git-subtree-mainline: $latest_old + git-subtree-split: $latest_new + EOF +} + +toptree_for_commit() +{ + commit="$1" + git log -1 --pretty=format:'%T' "$commit" -- || exit $? +} + +subtree_for_commit() +{ + commit="$1" + dir="$2" + git ls-tree "$commit" -- "$dir" | + while read mode type tree name; do + assert [ "$name" = "$dir" ] + echo $tree + break + done +} + +tree_changed() +{ + tree=$1 + shift + if [ $# -ne 1 ]; then + return 0 # weird parents, consider it changed + else + ptree=$(toptree_for_commit $1) + if [ "$ptree" != "$tree" ]; then + return 0 # changed + else + return 1 # not changed + fi + fi +} + +copy_or_skip() +{ + rev="$1" + tree="$2" + newparents="$3" + assert [ -n "$tree" ] + + identical= + nonidentical= + p= + gotparents= + for parent in $newparents; do + ptree=$(toptree_for_commit $parent) || exit $? + [ -z "$ptree" ] && continue + if [ "$ptree" = "$tree" ]; then + # an identical parent could be used in place of this rev. + identical="$parent" + else + nonidentical="$parent" + fi + + # sometimes both old parents map to the same newparent; + # eliminate duplicates + is_new=1 + for gp in $gotparents; do + if [ "$gp" = "$parent" ]; then + is_new= + break + fi + done + if [ -n "$is_new" ]; then + gotparents="$gotparents $parent" + p="$p -p $parent" + fi + done + + if [ -n "$identical" ]; then + echo $identical + else + copy_commit $rev $tree "$p" || exit $? + fi +} + +ensure_clean() +{ + if ! git diff-index HEAD --exit-code --quiet; then + die "Working tree has modifications. Cannot add." + fi + if ! git diff-index --cached HEAD --exit-code --quiet; then + die "Index has modifications. Cannot add." + fi +} + +cmd_add() +{ + if [ -e "$dir" ]; then + die "'$dir' already exists. Cannot add." + fi + ensure_clean + + set -- $revs + if [ $# -ne 1 ]; then + die "You must provide exactly one revision. Got: '$revs'" + fi + rev="$1" + + debug "Adding $dir as '$rev'..." + git read-tree --prefix="$dir" $rev || exit $? + git checkout "$dir" || exit $? + tree=$(git write-tree) || exit $? + + headrev=$(git rev-parse HEAD) || exit $? + if [ -n "$headrev" -a "$headrev" != "$rev" ]; then + headp="-p $headrev" + else + headp= + fi + commit=$(add_msg "$dir" "$headrev" "$rev" | + git commit-tree $tree $headp -p "$rev") || exit $? + git reset "$commit" || exit $? +} + +cmd_split() +{ + debug "Splitting $dir..." + cache_setup || exit $? + + if [ -n "$onto" ]; then + debug "Reading history for --onto=$onto..." + git rev-list $onto | + while read rev; do + # the 'onto' history is already just the subdir, so + # any parent we find there can be used verbatim + debug " cache: $rev" + cache_set $rev $rev + done + fi + + if [ -n "$ignore_joins" ]; then + unrevs= + else + unrevs="$(find_existing_splits "$dir" "$revs")" + fi + + # We can't restrict rev-list to only $dir here, because some of our + # parents have the $dir contents the root, and those won't match. + # (and rev-list --follow doesn't seem to solve this) + grl='git rev-list --reverse --parents $revs $unrevs' + revmax=$(eval "$grl" | wc -l) + revcount=0 + createcount=0 + eval "$grl" | + while read rev parents; do + revcount=$(($revcount + 1)) + say -n "$revcount/$revmax ($createcount) " + debug "Processing commit: $rev" + exists=$(cache_get $rev) + if [ -n "$exists" ]; then + debug " prior: $exists" + continue + fi + createcount=$(($createcount + 1)) + debug " parents: $parents" + newparents=$(cache_get $parents) + debug " newparents: $newparents" + + tree=$(subtree_for_commit $rev "$dir") + debug " tree is: $tree" + [ -z $tree ] && continue + + newrev=$(copy_or_skip "$rev" "$tree" "$newparents") || exit $? + debug " newrev is: $newrev" + cache_set $rev $newrev + cache_set latest_new $newrev + cache_set latest_old $rev + done || exit $? + latest_new=$(cache_get latest_new) + if [ -z "$latest_new" ]; then + die "No new revisions were found" + fi + + if [ -n "$rejoin" ]; then + debug "Merging split branch into HEAD..." + latest_old=$(cache_get latest_old) + git merge -s ours \ + -m "$(merge_msg $dir $latest_old $latest_new)" \ + $latest_new >&2 + fi + echo $latest_new + exit 0 +} + +cmd_merge() +{ + ensure_clean + + set -- $revs + if [ $# -ne 1 ]; then + die "You must provide exactly one revision. Got: '$revs'" + fi + rev="$1" + + git merge -s subtree $rev +} + +cmd_pull() +{ + ensure_clean + set -x + git pull -s subtree "$@" +} + +"cmd_$command" "$@" -- 1.6.3.rc2.8.gbe66.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html