Re: filter-branch performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 09, 2014 at 07:52:33PM +0100, Henning Moll wrote:

> i am runningthis command
> 
> git filter-branch --env-filter 'export
> GIT_COMMITTER_EMAIL="$GIT_AUTHOR_EMAIL"
> GIT_COMMITTER_NAME="$GIT_AUTHOR_NAME" GIT_COMMITTER_DATE="$GIT_AUTHOR_DATE"'
> --prune-empty --tag-name-filter cat -- --all
> 
> in a repository which i copied to /dev/shm before. According to "top", the
> git process only consumes about 5 percent of the CPU. The load is between
> 0.70 and 1.00.
> 
> I assume that there is a lot of process forking going on. Could that be the
> cause?

Yes. filter-branch is a shell scripts, and it is probably running
multiple git commands per commit it is filtering.

> Any ideas how to further improve?

In your case you are not touching the tree contents at all. Last time I
looked into this, I believe that filter-branch always loaded the index
for each commit, even if no --index-filter is being used. So teaching
filter-branch to optimize this case would be one strategy.

Another is to try using "git fast-export | git fast-import", and munging
the data stream in between. That's may be more work, depending how fancy
you want to get with accurate parsing (look into fast-export's
--no-data, which omits blob data; that should make things faster and
make hacky context-less parsing less likely to cause problems).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]