> On Sep 23, 2018, at 4:55 PM, Eric Sunshine <sunshine@xxxxxxxxxxxxxx> wrote: > > On Sun, Sep 23, 2018 at 9:05 AM Lars Schneider <larsxschneider@xxxxxxxxx> wrote: >> I recently had to purge files from large Git repos (many files, many commits). >> The usual recommendation is to use `git filter-branch --index-filter` to purge >> files. However, this is *very* slow for large repos (e.g. it takes 45min to >> remove the `builtin` directory from git core). I realized that I can remove >> files *way* faster by exporting the repo, removing the file references, >> and then importing the repo (see Perl script below, it takes ~30sec to remove >> the `builtin` directory from git core). Do you see any problem with this >> approach? > > A couple comments: > > For purging files from a history, take a look at BFG[1] which bills > itself as "a simpler, faster alternative to git-filter-branch for > cleansing bad data out of your Git repository history". Yes, BFG is great. Unfortunately, it requires Java which is not available on every system I have to work with. I required a solution that would work in every Git environment. Hence the Perl script :-) > The approach of exporting to a fast-import stream, modifying the > stream, and re-importing is quite reasonable. Thanks for the confirmation! > However, rather than > re-inventing, take a look at reposurgeon[2], which allows you to do > major surgery on fast-import streams. Not only can it purge files from > a repository, but it can slice, dice, puree, and saute pretty much any > attribute of a repository. Wow. Reposurgeon looks very interesting. Thanks a lot for the pointer! Cheers, Lars > [1]: https://rtyley.github.io/bfg-repo-cleaner/ > [2]: http://www.catb.org/esr/reposurgeon/