All, I recently needed to extract the git history of a portion of an existing repository. My initial attempts using --subdirectory-filter, subtrees, etc weren't as successful as I'd hoped. The primary reason for my failures were due to the fact that this particular source repository has seen a lot of code movement and renames in-place. As a result, filters such as subdirectory filter failed to keep commits prior to the renames. So, long story short, I've attached below a hacked together script (yes, it's sad when one writes a script to call a script :-/) that solves the problem for me. My hope is that some other poor sob in my position discovers this script, uses it and moves on. If enough people think it's useful despite the cornercases [1], I'd be happy to work on integrating it into filter-branch. thx, Jason. [1] Namely that if two different files held the same full-path name at different times in the source repo, you'll get some errant commits in the history. ------------------->8-------------------------------------------------- #!/bin/bash # # git-filter-renames: Similar to --subdirectory-filter but tracks renames # # Basic use: # $ git clone path/to/source_repo dest_repo # $ cd dest_repo # $ git tags | xargs git tag -d # ours are signed, so would fail to verify # $ git remote remove origin # $ git gc --aggressive --prune=now --force # $ git fsck # $ git-filter-renames.sh "[PREFIX] " fileA subdirB/ fileC subdirD/subdirE ... # $ rm -rf .git/refs/original # $ git gc --aggressive --prune=now --force # $ git fsck DEBUG=1 if [ $# -le 1 ]; then echo >&2 "Usage:" echo >&2 " ${0##*/} '[subj prefix] ' fileA fileB dir1 sub/dir2" echo >&2 "" exit 1 fi if [ $DEBUG == 1 ]; then rm -rf /tmp/git-filter-renames-* fi TMP_DIR="`mktemp -d /tmp/git-filter-renames-XXXXXX`" PREFIX="${1}" shift # take in the list of files to preserve # note: directories are recursed echo -n "" >$TMP_DIR/user_list.txt for arg in $*; do if [ -d "$arg" ]; then find $arg -type f >>$TMP_DIR/user_list.txt elif [ -f "$arg" ]; then echo "$arg" >>$TMP_DIR/user_list.txt else echo >&2 "What the hell is '$arg'?" fi done echo -n "" >$TMP_DIR/trace_list.txt while read fn <&4; do while read ofn <&5; do echo "^$ofn\$" done 5< <(git log --format=format: --follow --name-only -- "$fn" | \ sed -e '/^$/d' | sort -u) done 4< <(cat $TMP_DIR/user_list.txt) | sort -u >>$TMP_DIR/trace_list.txt # stage the filter script cat >$TMP_DIR/filter.sh <<EOF git ls-files | \\ grep -vf $TMP_DIR/trace_list.txt | \\ xargs -r git rm -qrf --ignore-unmatch EOF chmod +x $TMP_DIR/filter.sh # stage the msg filter script cat >$TMP_DIR/msg_filter.sh <<EOF sed -e "1 s/^/$PREFIX/" EOF chmod +x $TMP_DIR/msg_filter.sh # do the filtering echo >&2 "Doing filtering" git filter-branch --prune-empty -f --index-filter "$TMP_DIR/filter.sh" \ --msg-filter "$TMP_DIR/msg_filter.sh" \ HEAD # cleanup if [ $DEBUG == 0 ]; then rm -rf $TMP_DIR fi