Ian Campbell <ijc@xxxxxxxxxxxxxx> writes: > Allowing for incremental updates of large trees. "by doing what" is missing. And ... > > I have been using this as part of the device tree extraction from the Linux > kernel source since 2013, about time I sent the patch upstream! ... this does not help understanding what is going on. It belongs to the space after three dashes. Perhaps Subject: filter-branch: stash away ref map in a branch With "--state-branch=<branchname>" option, the mapping from old object names and filtered ones in ./map/ directory is stashed away in the object database, and the one from the previous run is read to populate the ./map/ directory, allowing for incremental updates of large trees. or something? > > Signed-off-by: Ian Campbell <ijc@xxxxxxxxxxxxxx> > --- > git-filter-branch.sh | 39 ++++++++++++++++++++++++++++++++++++++- > 1 file changed, 38 insertions(+), 1 deletion(-) > > diff --git a/git-filter-branch.sh b/git-filter-branch.sh > index 3a74602ef..d07db3fee 100755 > --- a/git-filter-branch.sh > +++ b/git-filter-branch.sh > @@ -86,7 +86,7 @@ USAGE="[--setup <command>] [--env-filter <command>] > [--parent-filter <command>] [--msg-filter <command>] > [--commit-filter <command>] [--tag-name-filter <command>] > [--subdirectory-filter <directory>] [--original <namespace>] > - [-d <directory>] [-f | --force] > + [-d <directory>] [-f | --force] [--state-branch <branch>] > [--] [<rev-list options>...]" > > OPTIONS_SPEC= > @@ -106,6 +106,7 @@ filter_msg=cat > filter_commit= > filter_tag_name= > filter_subdir= > +state_branch= > orig_namespace=refs/original/ > force= > prune_empty= > @@ -181,6 +182,9 @@ do > --original) > orig_namespace=$(expr "$OPTARG/" : '\(.*[^/]\)/*$')/ > ;; > + --state-branch) > + state_branch="$OPTARG" > + ;; > *) > usage > ;; > @@ -252,6 +256,20 @@ export GIT_INDEX_FILE > # map old->new commit ids for rewriting parents > mkdir ../map || die "Could not create map/ directory" > > +if [ -n "$state_branch" ] ; then > + state_commit=`git show-ref -s "$state_branch"` I hate to nitpick styles, especially on this script that already has existing violations, but for completeness: Style: we prefer to write $(command substitution) instead. Style: we prefer to write "if test", not "if [". Style: we prefer to avoid ';' and write "if test condtion" and "then" on different lines. It is a bit curious use of "show-ref". It is not wrong per-se, but "git rev-parse" may be more common. I do not care too deeply either way, though. Don't we want to make sure the value given to --state-branch is a full refname, not just a branch name? What happens when you say filter-branch --state-branch master by mistake? "show-ref -s" is likely to show your refs/heads/master, and other master branches that appear as remote-tracking branches for the remotes you interact with. > + if [ -n "$state_commit" ] ; then > + echo "Populating map from $state_branch ($state_commit)" 1>&2 > + git show "$state_commit":filter.map | > + perl -n -e 'm/(.*):(.*)/ or die; > + open F, ">../map/$1" or die; > + print F "$2" or die; > + close(F) or die' The process calling this perl script, which carefully diagnoses malformed input and dies, does not seem to do anything when it sees errors. Intended? > + else > + echo "Branch $state_branch does not exist. Will create" 1>&2 > + fi > +fi > + > # we need "--" only if there are no path arguments in $@ > nonrevs=$(git rev-parse --no-revs "$@") || exit > if test -z "$nonrevs" > @@ -544,6 +562,25 @@ if [ "$filter_tag_name" ]; then > done > fi > > +if [ -n "$state_branch" ] ; then > + echo "Saving rewrite state to $state_branch" 1>&2 > + STATE_BLOB=$(ls ../map | > + perl -n -e 'chomp(); > + open F, "<../map/$_" or die; > + chomp($f = <F>); print "$_:$f\n";' | I see it somewhat gross to pipe the output of "/bin/ls" to a Perl script, instead of iterating over "while (<../map/*>)" inside the script itself. > + git hash-object -w --stdin ) > + STATE_TREE=$(/bin/echo -e "100644 blob $STATE_BLOB\tfilter.map" | git mktree) > + STATE_PARENT=$(git show-ref -s "$state_branch") Don't you already have this in $state_commit? One advantage of reading $state_branch again at this point is to detect mistakes of running more than one filter-branch (which may cause you to read $STATE_PARENT that is different from $state_commit you read earlier), but I do not think that is being done here, so... > + unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE > + unset GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL GIT_COMMITTER_DATE Hmph. I can see that you are trying not to be affected by the committers and authors of the commits on the branch being filtered (which are set by finish_ident shell function), but I wonder if we could (and more importantly "want to") do better to preserve the real committer the user who runs the script may have in the environment before running it. I guess it does not matter that much, as long as the user has properly user.{name,email} configured elsewhere without relying on the environment variable. > + if [ -n "$STATE_PARENT" ] ; then > + STATE_COMMIT=$(/bin/echo "Sync" | git commit-tree "$STATE_TREE" -p "$STATE_PARENT") > + else > + STATE_COMMIT=$(/bin/echo "Sync" | git commit-tree "$STATE_TREE" ) > + fi > + git update-ref "$state_branch" "$STATE_COMMIT" > +fi > + > cd "$orig_dir" > rm -rf "$tempdir" Despite all the above comments, I like what you are trying to achieve here. Thanks for sharing.