Re: git filter-branch should run git gc --auto

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jan 22, 2008, at 9:46 PM, Junio C Hamano wrote:

Kevin Ballard <kevin@xxxxxx> writes:

I just glanced at git-filter-branch.sh (and I must say I was
incredibly surprised to find out it was a shell script) and it seems
it never runs git-gc or git-repack. Doesn't that end up with the same
problems as git-svn sans git-repack when filtering a large number of
commits? I was just thinking, if I were to git-filter-branch on my
massive repo (in fact, the same repo that started this thread, with
over 33000 commits in the upstream svn repo), even if I just do
something as simple as change the commit msg wont I end up with
thousands of unreachable objects? I shudder to think how many
unreachable objects I would have if I pruned the entire dports
directory off of the tree.

Am I missing something, or does git-filter-branch really not do any
garbage collection? I tried reading the source, but complex bash
scripts are almost as bad as perl in terms of readability.

Theoretically yes, and it largely depends on what you do, but
filter-branch goes over the objects that already exists in your
repository, and hopefully you won't be rewriting majority of
them.

So the impact of not repacking is probably much less painful in
practice.

But again as I said, it largely depends on what you do in your
filter.  If you are upcasing (or convert to NFD ;-)) the
contents of all of your blob objects, you would certainly want
to repack every once in a while.


I'm actually considering what the cost would be of switching macports to git (not that it will ever happen - too many anonymous people pull from svn trunk). Right now the svn trunk contains a subfolder for the source code and another subfolder for all ~4400+ Portfiles. In such a theoretical move, I'd want to split that up, probably into two unrelated branches. Doing so would mean running git-filter-branch over a linear commit history that's 31580 objects long, with a tree filter to prune the dports directory away and a msg filter to remove the svn- id stuff that git-svn left behind. This means that every single commit objects would be changed, as well as the root tree object for every single commit. That would be about 63160 objects. I'd also have to figure out some way to remove the commit objects entirely that only reference the dports directory. Then I'd have to do it again with the opposite tree filter (to prune everything but the dports directory and move the contents of the dports directory up one level) and same msg filter. Granted, if I do the first action in a branch, that leaves no unreachable objects (since the originals are still referenced), but the second operation definitely would leave unreachable objects, and were I to clone the repository instead and do the operations in the different repos (which is perfectly legitimate - otherwise I'd have to clone it after everything else and then delete branches) then both actions would leave thousands of objects unreachable.

I'd suggest a patch to run git gc --auto, but it looks like you just did in a subsequent email. As for your comments about the reflogs, can't I disable recording those, at least temporarily? I'd rather clean up after myself as I work rather than balloon the repository and collapse it in a single operation at the end.

-Kevin Ballard

--
Kevin Ballard
http://kevin.sb.org
kevin@xxxxxx
http://www.tildesoft.com


<<attachment: smime.p7s>>


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux