On 8/26/2019 7:52 PM, Elijah Newren wrote: > filter-branch suffers from a huge number of pitfalls that can result in > incorrectly rewritten history, and many of the problems can easily go > undetected until the new repository is in use. This can result in > problems ranging from an even messier history than what led folks to > filter-branch in the first place, to data loss or corruption. These > issues cannot be backward compatibly fixed, so add a warning to the > filter-branch manpage about this and recommand that another tool (such > as filter-repo) be used instead. > > Also, update other manpages that referenced filter-branch. Several of > these needed updates even if we could continue recommending > filter-branch, either due to implying that something was unique to > filter-branch when it applied more generally to all history rewriting > tools (e.g. BFG, reposurgeon, fast-import, filter-repo), or because > something about filter-branch was used as an example despite other more > commonly known examples now existing. Reword these sections to fix > these issues and to avoid recommending filter-branch. > > Finally, remove the section explaining BFG Repo Cleaner as an > alternative to filter-branch. I feel somewhat bad about this, > especially since I feel like I learned so much from BFG that I put to > good use in filter-repo (which is much more than I can say for > filter-branch), but keeping that section presented a few problems: > * In order to recommend that people quit using filter-branch, we need > to provide them a recomendation for something else to use that > can handle all the same types of rewrites. To my knowledge, > filter-repo is the only such tool. So it needs to be mentioned. > * I don't want to give conflicting recommendations to users > * If we recommend two tools, we shouldn't expect users to learn both > and pick which one to use; we should explain which problems one > can solve that the other can't or when one is much faster than > the other. > * BFG and filter-repo have similar performance > * All filtering types that BFG can do, filter-repo can also do. In > fact, filter-repo comes with a reimplementation of BFG named > bfg-ish which provides the same user-interface as BFG but with > several bugfixes and new features that are hard to implement in > BFG due to its technical underpinnings. > While I could still mention both tools, it seems like I would need to > provide some kind of comparison and I would ultimately just say that > filter-repo can do everything BFG can, so ultimately it seems that it > is just better to remove that section altogether. > > Signed-off-by: Elijah Newren <newren@xxxxxxxxx> > --- > Documentation/git-fast-export.txt | 6 ++--- > Documentation/git-filter-branch.txt | 42 ++++++++--------------------- > Documentation/git-gc.txt | 17 ++++++------ > Documentation/git-rebase.txt | 2 +- > Documentation/git-replace.txt | 10 +++---- > Documentation/git-svn.txt | 4 +-- > Documentation/githooks.txt | 7 ++--- > contrib/svn-fe/svn-fe.txt | 4 +-- > 8 files changed, 36 insertions(+), 56 deletions(-) > > diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt > index cc940eb9ad..784e934009 100644 > --- a/Documentation/git-fast-export.txt > +++ b/Documentation/git-fast-export.txt > @@ -17,9 +17,9 @@ This program dumps the given revisions in a form suitable to be piped > into 'git fast-import'. > > You can use it as a human-readable bundle replacement (see > -linkgit:git-bundle[1]), or as a kind of an interactive > -'git filter-branch'. > - > +linkgit:git-bundle[1]), or as a format that can be edited before being > +fed to 'git fast-import' in order to do history rewrites (an ability > +relied on by tools like 'git filter-repo'). > > OPTIONS > ------- > diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt > index 6b53dd7e06..8c586eed55 100644 > --- a/Documentation/git-filter-branch.txt > +++ b/Documentation/git-filter-branch.txt > @@ -16,6 +16,17 @@ SYNOPSIS > [--original <namespace>] [-d <directory>] [-f | --force] > [--state-branch <branch>] [--] [<rev-list options>...] > > +WARNING > +------- > +'git filter-branch' has a litany of gotchas that can and will cause > +history to be rewritten incorrectly (in addition to abysmal > +performance). These issues cannot be backward compatibly fixed and as > +such, its use is not recommended. Please use an alternative history > +filtering tool such as 'git filter-repo'. If you still need to use > +'git filter-branch', please carefully read the "Safety" section of > +https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@xxxxxxxxxxxxxx/ Is it possible to present this URL as a hyperlink with a succinct description? Maybe 'carefully read [the "Safety" section of this message on the Git mailing list](url).' (I'm using Markdown notation here as I don't know the equivalent for our docs.) > +and avoid as many of the pitfalls listed there as reasonably possible. > + > DESCRIPTION > ----------- > Lets you rewrite Git revision history by rewriting the branches mentioned > @@ -445,37 +456,6 @@ warned. > (or if your git-gc is not new enough to support arguments to > `--prune`, use `git repack -ad; git prune` instead). > > -NOTES > ------ > - > -git-filter-branch allows you to make complex shell-scripted rewrites > -of your Git history, but you probably don't need this flexibility if > -you're simply _removing unwanted data_ like large files or passwords. > -For those operations you may want to consider > -http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner], > -a JVM-based alternative to git-filter-branch, typically at least > -10-50x faster for those use-cases, and with quite different > -characteristics: > - > -* Any particular version of a file is cleaned exactly _once_. The BFG, > - unlike git-filter-branch, does not give you the opportunity to > - handle a file differently based on where or when it was committed > - within your history. This constraint gives the core performance > - benefit of The BFG, and is well-suited to the task of cleansing bad > - data - you don't care _where_ the bad data is, you just want it > - _gone_. > - > -* By default The BFG takes full advantage of multi-core machines, > - cleansing commit file-trees in parallel. git-filter-branch cleans > - commits sequentially (i.e. in a single-threaded manner), though it > - _is_ possible to write filters that include their own parallelism, > - in the scripts executed against each commit. > - > -* The http://rtyley.github.io/bfg-repo-cleaner/#examples[command options] > - are much more restrictive than git-filter branch, and dedicated just > - to the tasks of removing unwanted data- e.g: > - `--strip-blobs-bigger-than 1M`. > - > GIT > --- > Part of the linkgit:git[1] suite > diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt > index 247f765604..0c114ad1ca 100644 > --- a/Documentation/git-gc.txt > +++ b/Documentation/git-gc.txt > @@ -115,15 +115,14 @@ NOTES > ----- > > 'git gc' tries very hard not to delete objects that are referenced > -anywhere in your repository. In > -particular, it will keep not only objects referenced by your current set > -of branches and tags, but also objects referenced by the index, > -remote-tracking branches, refs saved by 'git filter-branch' in > -refs/original/, reflogs (which may reference commits in branches > -that were later amended or rewound), and anything else in the refs/* namespace. > -If you are expecting some objects to be deleted and they aren't, check > -all of those locations and decide whether it makes sense in your case to > -remove those references. > +anywhere in your repository. In particular, it will keep not only > +objects referenced by your current set of branches and tags, but also > +objects referenced by the index, remote-tracking branches, notes saved > +by 'git notes' under refs/notes/, reflogs (which may reference commits > +in branches that were later amended or rewound), and anything else in > +the refs/* namespace. If you are expecting some objects to be deleted > +and they aren't, check all of those locations and decide whether it > +makes sense in your case to remove those references. > > On the other hand, when 'git gc' runs concurrently with another process, > there is a risk of it deleting an object that the other process is using > diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt > index 6156609cf7..2f201d85d4 100644 > --- a/Documentation/git-rebase.txt > +++ b/Documentation/git-rebase.txt > @@ -832,7 +832,7 @@ Hard case: The changes are not the same.:: > This happens if the 'subsystem' rebase had conflicts, or used > `--interactive` to omit, edit, squash, or fixup commits; or > if the upstream used one of `commit --amend`, `reset`, or > - `filter-branch`. > + a full history rewriting command like `filter-repo`. > > > The easy case > diff --git a/Documentation/git-replace.txt b/Documentation/git-replace.txt > index 246dc9943c..35595a2cd3 100644 > --- a/Documentation/git-replace.txt > +++ b/Documentation/git-replace.txt > @@ -123,10 +123,10 @@ The following format are available: > CREATING REPLACEMENT OBJECTS > ---------------------------- > > -linkgit:git-filter-branch[1], linkgit:git-hash-object[1] and > -linkgit:git-rebase[1], among other git commands, can be used to create > -replacement objects from existing objects. The `--edit` option can > -also be used with 'git replace' to create a replacement object by > +linkgit:git-hash-object[1], linkgit:git-rebase[1], and > +linkgit:git-filter-repo[1], among other git commands, can be used to > +create replacement objects from existing objects. The `--edit` option > +can also be used with 'git replace' to create a replacement object by > editing an existing object. > > If you want to replace many blobs, trees or commits that are part of a > @@ -148,8 +148,8 @@ pending objects. > SEE ALSO > -------- > linkgit:git-hash-object[1] > -linkgit:git-filter-branch[1] > linkgit:git-rebase[1] > +linkgit:git-filter-repo[1] > linkgit:git-tag[1] > linkgit:git-branch[1] > linkgit:git-commit[1] > diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt > index 30711625fd..f2762dd5d4 100644 > --- a/Documentation/git-svn.txt > +++ b/Documentation/git-svn.txt > @@ -769,9 +769,9 @@ option for (hopefully) obvious reasons. > + > This option is NOT recommended as it makes it difficult to track down > old references to SVN revision numbers in existing documentation, bug > -reports and archives. If you plan to eventually migrate from SVN to Git > +reports, and archives. If you plan to eventually migrate from SVN to Git > and are certain about dropping SVN history, consider > -linkgit:git-filter-branch[1] instead. filter-branch also allows > +linkgit:git-filter-repo[1] instead. filter-repo also allows > reformatting of metadata for ease-of-reading and rewriting authorship > info for non-"svn.authorsFile" users. > > diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt > index 82cd573776..997548f5ed 100644 > --- a/Documentation/githooks.txt > +++ b/Documentation/githooks.txt > @@ -425,9 +425,10 @@ post-rewrite > > This hook is invoked by commands that rewrite commits > (linkgit:git-commit[1] when called with `--amend` and > -linkgit:git-rebase[1]; currently `git filter-branch` does 'not' call > -it!). Its first argument denotes the command it was invoked by: > -currently one of `amend` or `rebase`. Further command-dependent > +linkgit:git-rebase[1]; however, full-history (re)writing tools like > +linkgit:git-fast-import[1] or linkgit:git-filter-repo[1] typically do > +not call it!). Its first argument denotes the command it was invoked > +by: currently one of `amend` or `rebase`. Further command-dependent > arguments may be passed in the future. > > The hook receives a list of the rewritten commits on stdin, in the > diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt > index a3425f4770..19333fc8df 100644 > --- a/contrib/svn-fe/svn-fe.txt > +++ b/contrib/svn-fe/svn-fe.txt > @@ -56,7 +56,7 @@ line. This line has the form `git-svn-id: URL@REVNO UUID`. > > The resulting repository will generally require further processing > to put each project in its own repository and to separate the history > -of each branch. The 'git filter-branch --subdirectory-filter' command > +of each branch. The 'git filter-repo --subdirectory-filter' command > may be useful for this purpose. > > BUGS > @@ -67,5 +67,5 @@ The exit status does not reflect whether an error was detected. > > SEE ALSO > -------- > -git-svn(1), svn2git(1), svk(1), git-filter-branch(1), git-fast-import(1), > +git-svn(1), svn2git(1), svk(1), git-filter-repo(1), git-fast-import(1), > https://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt >