On Sun, Sep 23, 2018 at 9:05 AM Lars Schneider <larsxschneider@xxxxxxxxx> wrote: > I recently had to purge files from large Git repos (many files, many commits). > The usual recommendation is to use `git filter-branch --index-filter` to purge > files. However, this is *very* slow for large repos (e.g. it takes 45min to > remove the `builtin` directory from git core). I realized that I can remove > files *way* faster by exporting the repo, removing the file references, > and then importing the repo (see Perl script below, it takes ~30sec to remove > the `builtin` directory from git core). Do you see any problem with this > approach? A couple comments: For purging files from a history, take a look at BFG[1] which bills itself as "a simpler, faster alternative to git-filter-branch for cleansing bad data out of your Git repository history". The approach of exporting to a fast-import stream, modifying the stream, and re-importing is quite reasonable. However, rather than re-inventing, take a look at reposurgeon[2], which allows you to do major surgery on fast-import streams. Not only can it purge files from a repository, but it can slice, dice, puree, and saute pretty much any attribute of a repository. [1]: https://rtyley.github.io/bfg-repo-cleaner/ [2]: http://www.catb.org/esr/reposurgeon/