Re: Import/Export as a fast way to purge files from Git?

Jeff King <peff@xxxxxxxx> · Sun, 23 Sep 2018 13:04:04 -0400

On Sun, Sep 23, 2018 at 03:53:38PM +0000, brian m. carlson wrote:

> I suspect you're gaining speed mostly because you're running three
> processes total instead of at least one process (sh) per commit.  So I
> don't think there's anything that Git can do to make this faster on our
> end without a redesign.

It's not just the process startup overhead that makes it faster. Using
multiple processes means they have to communicate somehow. In this case,
git-read-tree is writing out the whole index for each commit, which
git-rm reads in and modifies, and then git-commit-tree finally converts
back to a tree. In addition to the raw CPU of that work, there's a bunch
of latency as each step is performed serially.

Whereas in the proposed pipeline, fast-export is writing out a diff and
fast-import is turning that directly back into tree objects. And both
processes are proceeding independently, so you benefit from multiple
cores.

Which isn't to say I really disagree with "Git can't really make this
faster". filter-branch has a ton of power to let you replay arbitrary
commands (including non-Git commands!), so the speed tradeoff in its
approach is very intentional. If we could modify the index in-place that
would probably make it a little faster, but that probably counts as
"redesign" in your statement. ;)

-Peff