Re: [PATCH] docs: add filter-branch note about The BFG

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Roberto Tyley <roberto.tyley@xxxxxxxxx> writes:

> The BFG is a tool specifically designed for the task of removing
> unwanted data from Git repository history - a common use-case for which
> git-filter-branch has been the traditional workhorse.
>
> It's beneficial to let users know that filter-branch has an alternative
> here:
>
> * speed : The BFG is 10-50x faster
>   http://rtyley.github.io/bfg-repo-cleaner/#speed
> * complexity of configuration : filter-branch is a very flexible tool,
>   but demands very careful usage in order to get the desired results
>   http://rtyley.github.io/bfg-repo-cleaner/#examples
>
> Obviously, filter-branch has it's advantages too - it permits very
> complex rewrites, and doesn't require a JVM - but for the common
> use-case of deleting unwanted data, it's helpful to users to be aware
> that an alternative exists.
>
> The BFG was released under the GPL in February 2013, and has since seen
> widespread production use (The Guardian, RedHat, Google, UK Government
> Digital Service), been tested against large repos (~300K commits, ~5GB
> packfiles) and received significant positive feedback from users:
>
> http://rtyley.github.io/bfg-repo-cleaner/#feedback
>
> Signed-off-by: Roberto Tyley <roberto.tyley@xxxxxxxxx>
> ---
>  Documentation/git-filter-branch.txt | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
> index e4c8e82..918e965 100644
> --- a/Documentation/git-filter-branch.txt
> +++ b/Documentation/git-filter-branch.txt
> @@ -18,6 +18,12 @@ SYNOPSIS
>  
>  DESCRIPTION
>  -----------
> +
> +NOTE: For simply removing unwanted data from repository history, you may
> +want to use link:http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner]
> +instead - it's generally faster and simpler for eliminating large files
> +or private data.
> +

My understanding is that the primary speed up of BFG comes from the
design decision it made to fitler each blob only once, unlike
filter-branch that allows you to (and forces you to) decide how the
same blob is filtered depending on the places it appears in space
(i.e. the path in the project's directory hierarchy) and time
(i.e. the commit it appears in).  For "removing unwanted data", I
think nobody needs the flexibility to filter differently depending
on the context, an it is a good idea to refer those with such need
to BFG.

Having said that, "You may want to use ..." without giving the
reason why we recommend the other tool leaves the reader wondering
what the pros and cons are, and why git-filter-branch exists if BFG
is the first thing its document recommends even before it describes
what git-filter-branch is and does.  "You may want to check ..."
might be slightly better, but probably by not that much improvement.

Rewriting "it's generally faster ..."  part to give a bit more info
to allow readers decide the pros and cons themselves may be needed.

>  Lets you rewrite Git revision history by rewriting the branches mentioned
>  in the <rev-list options>, applying custom filters on each revision.
>  Those filters can modify each tree (e.g. removing a file or running
> @@ -393,7 +399,7 @@ git filter-branch --index-filter \
>  Checklist for Shrinking a Repository
>  ------------------------------------
>  
> -git-filter-branch is often used to get rid of a subset of files,
> +git-filter-branch can be used to get rid of a subset of files,
>  usually with some combination of `--index-filter` and
>  `--subdirectory-filter`.  People expect the resulting repository to
>  be smaller than the original, but you need a few more steps to
> @@ -429,6 +435,12 @@ warned.
>    (or if your git-gc is not new enough to support arguments to
>    `--prune`, use `git repack -ad; git prune` instead).
>  
> +SEE ALSO
> +--------
> +link:http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner]
> +- a tool specifically designed for removing unwanted data from Git
> +repository history.
> +
>  GIT
>  ---
>  Part of the linkgit:git[1] suite
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]