On Mon, Apr 21, 2008 at 05:53:34PM -0400, Avery Pennarun wrote: > Does anyone know the most efficient way to do this with > git-filter-branch, when there are already thousands of files in the > repo with CRLF in them? Running dos2unix on all the files for every > single revision could take a *very* long time. Yes, a tree filter would probably be quite slow due to checking out, and then munging all of the files. You could maybe do an index filter that gets the blob SHA1 of each file that is new, and just munges those. But I think it is even simpler to just keep a cache of original blob hashes mapping to munged blob hashes. Something like: git filter-branch --index-filter ' git ls-files --stage | perl /path/to/caching-munger | git update-index --index-info ' where your caching munger looks something like: -- >8 -- #!/usr/bin/perl use strict; use DB_File; use Fcntl; tie my %cache, 'DB_File', "$ENV{HOME}/filter-cache", O_RDWR|O_CREAT, 0666 or die "unable to open db: $!"; while(<>) { my ($mode, $hash, $path) = /^(\d+) ([0-9a-f]{40}) \d\t(.*)/ or die "bad ls-files line: $_"; $cache{$hash} = munge($hash) unless exists $cache{$hash}; print "$mode $cache{$hash}\t$path\n"; } sub munge { my $h = shift; my $r = scalar `git show $h | sed 's/\$/\\r/' | git hash-object -w --stdin`; chomp $r; return $r; } -- 8< -- so we keep a dbm of the hash mapping, and do no work if we have already seen this blob. If we don't, then we actually do the expensive 'show | munge | hash-object'. And here our munge adds a CR, but you should be able to do an arbitrary transformation. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html