Hi, I recently had to purge files from large Git repos (many files, many commits). The usual recommendation is to use `git filter-branch --index-filter` to purge files. However, this is *very* slow for large repos (e.g. it takes 45min to remove the `builtin` directory from git core). I realized that I can remove files *way* faster by exporting the repo, removing the file references, and then importing the repo (see Perl script below, it takes ~30sec to remove the `builtin` directory from git core). Do you see any problem with this approach? Thank you, Lars #!/usr/bin/perl # # Purge paths from Git repositories. # # Usage: # git-purge-path [path-regex1] [path-regex2] ... # # Examples: # Remove the file "test.bin" from all directories: # git-purge-path "/test.bin$" # # Remove all "*.bin" files from all directories: # git-purge-path "\.bin$" # # Remove all files in the "/foo" directory: # git-purge-path "^/foo/$" # # Attention: # You want to run this script on a case sensitive file-system (e.g. # ext4 on Linux). Otherwise the resulting Git repository will not # contain changes that modify the casing of file paths. # use strict; use warnings; open( my $pipe_in, "git fast-export --progress=100 --no-data HEAD |" ) or die $!; open( my $pipe_out, "| git fast-import --force --quiet" ) or die $!; LOOP: while ( my $cmd = <$pipe_in> ) { my $data = ""; if ( $cmd =~ /^data ([0-9]+)$/ ) { # skip data blocks my $skip_bytes = $1; read($pipe_in, $data, $skip_bytes); } elsif ( $cmd =~ /^M [0-9]{6} [0-9a-f]{40} (.+)$/ ) { my $pathname = $1; foreach (@ARGV) { next LOOP if ("/" . $pathname) =~ /$_/ } } print {$pipe_out} $cmd . $data; }