Export the current file path as $GIT_BLOB_PATH, so we can filter a blob differently based on its path, and change the caching mechanism to re-filter a particular blob if its path changes. Also, make it much faster by not calling 'cat'. The main loop of munge_blobs() had to fork-exec "cat" every time through the loop, even when a blob was already cached. Let's use the sh builtin 'read' instead for a huge speedup. cd git time git filter-branch --blob-filter 'tr a-z A-Z' HEAD~10..HEAD (original --blob-filter) real 3m58.569s user 0m22.900s sys 3m32.030s (with 'cat' calls removed) real 1m11.931s user 0m8.520s sys 1m2.900s (with 'cat' calls removed and blob cache already filled) real 0m19.660s user 0m3.930s sys 0m15.720s Signed-off-by: Avery Pennarun <apenwarr@xxxxxxxxx> --- Documentation/git-filter-branch.txt | 27 +++++++++++++++++++++++++++ git-filter-branch.sh | 27 +++++++++++++++++---------- 2 files changed, 44 insertions(+), 10 deletions(-) diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt index ea77f1f..0c5cd0f 100644 --- a/Documentation/git-filter-branch.txt +++ b/Documentation/git-filter-branch.txt @@ -12,6 +12,7 @@ SYNOPSIS [--index-filter <command>] [--parent-filter <command>] [--msg-filter <command>] [--commit-filter <command>] [--tag-name-filter <command>] [--subdirectory-filter <directory>] + [--blob-filter <command] [--original <namespace>] [-d <directory>] [-f | --force] [<rev-list options>...] @@ -149,6 +150,16 @@ to other tags will be rewritten to point to the underlying commit. The result will contain that directory (and only that) as its project root. +--blob-filter <command>:: + This is the filter for modifying the contents of each file (blob) in + the tree. The contents of a file are provided on stdin, and the new + file contents should be provided on stdout. The pathname of the + blob in the current revision is in $GIT_BLOB_PATH. For efficiency, + the before/after results of a given blob+filename are only + calculated once and then cached, so your filter must always return + the same output blob for any given input blob. You might use this + filter for converting CRLF to LF in all your files, for example. + --original <namespace>:: Use this option to set the namespace where the original commits will be stored. The default value is 'refs/original'. @@ -196,6 +207,22 @@ git filter-branch --index-filter 'git update-index --remove filename' HEAD Now, you will get the rewritten history saved in HEAD. +To convert CRLF to LF in all your files using the "fromdos" program (be +careful: this will attempt to modify binary files too!): + +---------------------------------------------- +git filter-branch --blob-filter 'fromdos' HEAD +---------------------------------------------- + +To convert CRLF to LF in all your *.c and *.cpp files: + +--------------------------------------------------------- +git filter-branch --blob-filter 'case "$GIT_BLOB_PATH" in + *.c|*.cpp) fromdos;; + *) cat;; +esac' HEAD +--------------------------------------------------------- + To set a commit (which typically is at the tip of another history) to be the parent of the current initial commit, in order to paste the other history behind the current history: diff --git a/git-filter-branch.sh b/git-filter-branch.sh index a0d9a79..f1ee263 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -55,19 +55,24 @@ EOF eval "$functions" munge_blobs() { - while read mode sha1 stage path + while read GIT_BLOB_MODE GIT_BLOB_SHA1 stage GIT_BLOB_PATH do - if ! test -r "$workdir/../blob-cache/$sha1" + export GIT_BLOB_MODE GIT_BLOB_SHA1 GIT_BLOB_PATH + cachefile="$cachedir/$GIT_BLOB_SHA1/$GIT_BLOB_PATH" + if ! test -r "$cachefile" then - new=`git cat-file blob $sha1 | - eval "$filter_blob" | - git hash-object -w --stdin` - printf $new >$workdir/../blob-cache/$sha1 + new=$(git cat-file blob $GIT_BLOB_SHA1 | + eval "$filter_blob" | + git hash-object -w --stdin) + mkdir -p "$(dirname "$cachefile")" + echo -n $new >"$cachefile" + else + read new <"$cachefile" fi printf "%s %s\t%s\n" \ - "$mode" \ - $(cat "$workdir/../blob-cache/$sha1") \ - "$path" + "$GIT_BLOB_MODE" \ + "$new" \ + "$GIT_BLOB_PATH" done } @@ -108,6 +113,7 @@ USAGE="[--env-filter <command>] [--tree-filter <command>] \ [--index-filter <command>] [--parent-filter <command>] \ [--msg-filter <command>] [--commit-filter <command>] \ [--tag-name-filter <command>] [--subdirectory-filter <directory>] \ +[--blob-filter <command>] \ [--original <namespace>] [-d <directory>] [-f | --force] \ [<rev-list options>...]" @@ -249,7 +255,8 @@ ret=0 mkdir ../map || die "Could not create map/ directory" # cache rewritten blobs for blob filter -mkdir ../blob-cache || die "Could not create blob-cache/ directory" +cachedir="$workdir/../blob-cache" +mkdir "$cachedir" || die "Could not create blob-cache/ directory" case "$filter_subdir" in "") -- 1.5.6.rc2.29.g4717e -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html