filter-branch suffers from a deluge of disguised dangers that disfigure history rewrites (i.e. deviate from the deliberate changes). Many of these problems are unobtrusive and can easily go undiscovered until the new repository is in use. This can result in problems ranging from an even messier history than what led folks to filter-branch in the first place, to data loss or corruption. These issues cannot be backward compatibly fixed, so add a warning to both filter-branch and its manpage recommending that another tool (such as filter-repo) be used instead. Also, update other manpages that referenced filter-branch. Several of these needed updates even if we could continue recommending filter-branch, either due to implying that something was unique to filter-branch when it applied more generally to all history rewriting tools (e.g. BFG, reposurgeon, fast-import, filter-repo), or because something about filter-branch was used as an example despite other more commonly known examples now existing. Reword these sections to fix these issues and to avoid recommending filter-branch. Finally, remove the section explaining BFG Repo Cleaner as an alternative to filter-branch. I feel somewhat bad about this, especially since I feel like I learned so much from BFG that I put to good use in filter-repo (which is much more than I can say for filter-branch), but keeping that section presented a few problems: * In order to recommend that people quit using filter-branch, we need to provide them a recomendation for something else to use that can handle all the same types of rewrites. To my knowledge, filter-repo is the only such tool. So it needs to be mentioned. * I don't want to give conflicting recommendations to users * If we recommend two tools, we shouldn't expect users to learn both and pick which one to use; we should explain which problems one can solve that the other can't or when one is much faster than the other. * BFG and filter-repo have similar performance * All filtering types that BFG can do, filter-repo can also do. In fact, filter-repo comes with a reimplementation of BFG named bfg-ish which provides the same user-interface as BFG but with several bugfixes and new features that are hard to implement in BFG due to its technical underpinnings. While I could still mention both tools, it seems like I would need to provide some kind of comparison and I would ultimately just say that filter-repo can do everything BFG can, so ultimately it seems that it is just better to remove that section altogether. Signed-off-by: Elijah Newren <newren@xxxxxxxxx> --- Documentation/git-fast-export.txt | 6 ++-- Documentation/git-filter-branch.txt | 45 +++++++++-------------------- Documentation/git-gc.txt | 17 +++++------ Documentation/git-rebase.txt | 2 +- Documentation/git-replace.txt | 10 +++---- Documentation/git-svn.txt | 4 +-- Documentation/githooks.txt | 7 +++-- contrib/svn-fe/svn-fe.txt | 4 +-- git-filter-branch.sh | 13 +++++++++ 9 files changed, 52 insertions(+), 56 deletions(-) mode change 100755 => 100644 git-filter-branch.sh diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt index cc940eb9ad..784e934009 100644 --- a/Documentation/git-fast-export.txt +++ b/Documentation/git-fast-export.txt @@ -17,9 +17,9 @@ This program dumps the given revisions in a form suitable to be piped into 'git fast-import'. You can use it as a human-readable bundle replacement (see -linkgit:git-bundle[1]), or as a kind of an interactive -'git filter-branch'. - +linkgit:git-bundle[1]), or as a format that can be edited before being +fed to 'git fast-import' in order to do history rewrites (an ability +relied on by tools like 'git filter-repo'). OPTIONS ------- diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt index 6b53dd7e06..e4047d472e 100644 --- a/Documentation/git-filter-branch.txt +++ b/Documentation/git-filter-branch.txt @@ -16,6 +16,20 @@ SYNOPSIS [--original <namespace>] [-d <directory>] [-f | --force] [--state-branch <branch>] [--] [<rev-list options>...] +WARNING +------- +'git filter-branch' has a plethora of pitfalls that can produce non-obvious +manglings of the intended history rewrite (and can leave you with little +time to investigate such problems since it has such abysmal performance). +These safety and performance issues cannot be backward compatibly fixed and +as such, its use is not recommended. Please use an alternative history +filtering tool such as https://github.com/newren/git-filter-repo/[git +filter-repo]. If you still need to use 'git filter-branch', please +carefully read the "Safety" section of the message on the Git mailing list +https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@xxxxxxxxxxxxxx/[detailing +the land mines of filter-branch] and vigilantly avoid as many of the +hazards listed there as reasonably possible. + DESCRIPTION ----------- Lets you rewrite Git revision history by rewriting the branches mentioned @@ -445,37 +459,6 @@ warned. (or if your git-gc is not new enough to support arguments to `--prune`, use `git repack -ad; git prune` instead). -NOTES ------ - -git-filter-branch allows you to make complex shell-scripted rewrites -of your Git history, but you probably don't need this flexibility if -you're simply _removing unwanted data_ like large files or passwords. -For those operations you may want to consider -http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner], -a JVM-based alternative to git-filter-branch, typically at least -10-50x faster for those use-cases, and with quite different -characteristics: - -* Any particular version of a file is cleaned exactly _once_. The BFG, - unlike git-filter-branch, does not give you the opportunity to - handle a file differently based on where or when it was committed - within your history. This constraint gives the core performance - benefit of The BFG, and is well-suited to the task of cleansing bad - data - you don't care _where_ the bad data is, you just want it - _gone_. - -* By default The BFG takes full advantage of multi-core machines, - cleansing commit file-trees in parallel. git-filter-branch cleans - commits sequentially (i.e. in a single-threaded manner), though it - _is_ possible to write filters that include their own parallelism, - in the scripts executed against each commit. - -* The http://rtyley.github.io/bfg-repo-cleaner/#examples[command options] - are much more restrictive than git-filter branch, and dedicated just - to the tasks of removing unwanted data- e.g: - `--strip-blobs-bigger-than 1M`. - GIT --- Part of the linkgit:git[1] suite diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt index 247f765604..0c114ad1ca 100644 --- a/Documentation/git-gc.txt +++ b/Documentation/git-gc.txt @@ -115,15 +115,14 @@ NOTES ----- 'git gc' tries very hard not to delete objects that are referenced -anywhere in your repository. In -particular, it will keep not only objects referenced by your current set -of branches and tags, but also objects referenced by the index, -remote-tracking branches, refs saved by 'git filter-branch' in -refs/original/, reflogs (which may reference commits in branches -that were later amended or rewound), and anything else in the refs/* namespace. -If you are expecting some objects to be deleted and they aren't, check -all of those locations and decide whether it makes sense in your case to -remove those references. +anywhere in your repository. In particular, it will keep not only +objects referenced by your current set of branches and tags, but also +objects referenced by the index, remote-tracking branches, notes saved +by 'git notes' under refs/notes/, reflogs (which may reference commits +in branches that were later amended or rewound), and anything else in +the refs/* namespace. If you are expecting some objects to be deleted +and they aren't, check all of those locations and decide whether it +makes sense in your case to remove those references. On the other hand, when 'git gc' runs concurrently with another process, there is a risk of it deleting an object that the other process is using diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt index 6156609cf7..2f201d85d4 100644 --- a/Documentation/git-rebase.txt +++ b/Documentation/git-rebase.txt @@ -832,7 +832,7 @@ Hard case: The changes are not the same.:: This happens if the 'subsystem' rebase had conflicts, or used `--interactive` to omit, edit, squash, or fixup commits; or if the upstream used one of `commit --amend`, `reset`, or - `filter-branch`. + a full history rewriting command like `filter-repo`. The easy case diff --git a/Documentation/git-replace.txt b/Documentation/git-replace.txt index 246dc9943c..35595a2cd3 100644 --- a/Documentation/git-replace.txt +++ b/Documentation/git-replace.txt @@ -123,10 +123,10 @@ The following format are available: CREATING REPLACEMENT OBJECTS ---------------------------- -linkgit:git-filter-branch[1], linkgit:git-hash-object[1] and -linkgit:git-rebase[1], among other git commands, can be used to create -replacement objects from existing objects. The `--edit` option can -also be used with 'git replace' to create a replacement object by +linkgit:git-hash-object[1], linkgit:git-rebase[1], and +linkgit:git-filter-repo[1], among other git commands, can be used to +create replacement objects from existing objects. The `--edit` option +can also be used with 'git replace' to create a replacement object by editing an existing object. If you want to replace many blobs, trees or commits that are part of a @@ -148,8 +148,8 @@ pending objects. SEE ALSO -------- linkgit:git-hash-object[1] -linkgit:git-filter-branch[1] linkgit:git-rebase[1] +linkgit:git-filter-repo[1] linkgit:git-tag[1] linkgit:git-branch[1] linkgit:git-commit[1] diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt index 30711625fd..f2762dd5d4 100644 --- a/Documentation/git-svn.txt +++ b/Documentation/git-svn.txt @@ -769,9 +769,9 @@ option for (hopefully) obvious reasons. + This option is NOT recommended as it makes it difficult to track down old references to SVN revision numbers in existing documentation, bug -reports and archives. If you plan to eventually migrate from SVN to Git +reports, and archives. If you plan to eventually migrate from SVN to Git and are certain about dropping SVN history, consider -linkgit:git-filter-branch[1] instead. filter-branch also allows +linkgit:git-filter-repo[1] instead. filter-repo also allows reformatting of metadata for ease-of-reading and rewriting authorship info for non-"svn.authorsFile" users. diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt index 82cd573776..997548f5ed 100644 --- a/Documentation/githooks.txt +++ b/Documentation/githooks.txt @@ -425,9 +425,10 @@ post-rewrite This hook is invoked by commands that rewrite commits (linkgit:git-commit[1] when called with `--amend` and -linkgit:git-rebase[1]; currently `git filter-branch` does 'not' call -it!). Its first argument denotes the command it was invoked by: -currently one of `amend` or `rebase`. Further command-dependent +linkgit:git-rebase[1]; however, full-history (re)writing tools like +linkgit:git-fast-import[1] or linkgit:git-filter-repo[1] typically do +not call it!). Its first argument denotes the command it was invoked +by: currently one of `amend` or `rebase`. Further command-dependent arguments may be passed in the future. The hook receives a list of the rewritten commits on stdin, in the diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt index a3425f4770..19333fc8df 100644 --- a/contrib/svn-fe/svn-fe.txt +++ b/contrib/svn-fe/svn-fe.txt @@ -56,7 +56,7 @@ line. This line has the form `git-svn-id: URL@REVNO UUID`. The resulting repository will generally require further processing to put each project in its own repository and to separate the history -of each branch. The 'git filter-branch --subdirectory-filter' command +of each branch. The 'git filter-repo --subdirectory-filter' command may be useful for this purpose. BUGS @@ -67,5 +67,5 @@ The exit status does not reflect whether an error was detected. SEE ALSO -------- -git-svn(1), svn2git(1), svk(1), git-filter-branch(1), git-fast-import(1), +git-svn(1), svn2git(1), svk(1), git-filter-repo(1), git-fast-import(1), https://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt diff --git a/git-filter-branch.sh b/git-filter-branch.sh old mode 100755 new mode 100644 index 5c5afa2b98..7b1865c1d5 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -83,6 +83,19 @@ set_ident () { finish_ident COMMITTER } +if [ -z "$FILTER_BRANCH_SQUELCH_WARNING" -a \ + -z "$GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS" ]; then + cat <<EOF +WARNING: git-filter-branch has a glut of gotchas generating mangled history + rewrites. Please use an alternative filtering tool such as 'git + filter-repo' (https://github.com/newren/git-filter-repo/) instead. + See the filter-branch manual page for more details; to squelch + this warning and pause, set FILTER_BRANCH_SQUELCH_WARNING=1. + +EOF + sleep 5 +fi + USAGE="[--setup <command>] [--subdirectory-filter <directory>] [--env-filter <command>] [--tree-filter <command>] [--index-filter <command>] [--parent-filter <command>] [--msg-filter <command>] -- 2.23.0.3.gcc10030edf.dirty