On Thu, 31 Jan 2019 at 22:37, Elijah Newren <newren@xxxxxxxxx> wrote: > On Thu, Jan 31, 2019 at 8:09 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Elijah Newren <newren@xxxxxxxxx> writes: > > > > > git-filter-repo[1], a filter-branch-like tool for rewriting repository > > > history, is ready for more widespread testing and feedback. The rough > > > edges I previously mentioned have been fixed, and it has several useful > > > features already, though more development work is ongoing (docs are a > > > bit sparse right now, though -h provides some help). > > > > > > Why filter-repo vs. filter-branch? I like the name! I think a lot of users are interested in filtering their entire repo, rather than rewriting a single branch. > > How does it compare with bfg-repo-cleaner? Somehow I was led to > > believe that all serious users of filter-branch like functionality > > are using bfg-repo-cleaner instead. > > No, bfg-repo-cleaner only covers an important subset of the usecases. That's true - the focus with BFG Repo-Cleaner is on removing unwanted data - completely eradicating it from a repo's history. There are some mistakes in history that repo owners just really *do not* want to share (ie large files, private data/credentials), and they can be a critical blocker to sharing or working with a Git repo. In terms of rewriting history, my internal criterion for what I features I really want to be in the BFG is: is this unwanted data completely stopping many users from sharing their code or doing their work? I understand that when it comes to rewriting history, there are loads of other operations that people sometimes want to perform, beyond removing unwanted data - merging/splitting of history, anonymization/renaming of committers, etc. Some of those might be nice to add to the BFG - but as with many OSS-maintainers, I have limited time, and a life to balance outside of software...! > bfg-repo-cleaner does a really good job if your goal is to remove a > few big files and/or to remove some sensitive text (matched via > regexes) from all blobs. It was designed for that specific role and > has more options in this area than filter-repo currently has. But > even within this design space it was optimized for, it is missing two > things that I really want: > > * pruning of commits which become empty due to filtering There certainly have been several users asking for this feature on the BFG, and even a kindly contributed PR for the functionality which I've yet to merge. As it doesn't actually stop users from doing work - so far as I can see - it's something that I've done a poor job of following up. > * providing a way for the user to know what needs to be cleaned up. > It has options like --strip-blobs-bigger-than <size> or > --strip-biggest-blobs <NUM>, but no way for the user to figure out > what <size> or <NUM> should be. For users of GitHub, It's normally 100MB with --strip-blobs-bigger-than <size> :-) > Also, since it just focuses on really > big blobs, it misses cases like someone checking in directories with a > huge number of small-to-moderately sized files (e.g. bower_components/ > or node_modules/, though these could also contain a few big blobs For those use-cases, it might be that BFG's --delete-folders flag is useful, especially given the protected-head-commit feature of the BFG. It's getting late for me, must be even later in Brussels - I wish I could have made it there to join in! Merry Git Merge to you all, and good luck to you Elijah with git-filter-repo. Roberto