Re: git gc expanding packed data?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 5, 2009 at 11:39 PM, Nicolas Pitre<nico@xxxxxxx> wrote:
> On Tue, 4 Aug 2009, Hin-Tak Leung wrote:
>
>> I cloned gcc's git about a week ago to work on some problems I have
>> with gcc on minor platforms, just plain 'git clone
>> git://gcc.gnu.org/git/gcc.git gcc' .and ran gcc fetch about daily, and
>> 'git rebase origin' from time to time. I don't have local changes,
>> just following and monitoring what's going on in gcc. So after a week,
>> I thought I'd do a git gc . Then it goes very bizarre.
>>
>> Before I start 'git gc', .The whole of .git was about 700MB and
>> git/objects/pack was a bit under 600MB, with a few other directories
>> under .git/objects at 10's of K's and a few 30000-40000K's, and the
>> checkout was, well, the size of gcc source code. But after I started
>> git gc, the message stays in the 'counting objects' at about 900,000
>> for a long time, while a lot of directories under .git/objects/ gets a
>> bit large, and .git blows up to at least 7GB with a lot of small files
>> under .git/objects/*/, before seeing as I will run out of disk space,
>> I kill the whole lot and ran git clone again, since I don't have any
>> local change and there is nothing to lose.
>>
>> I am running git version 1.6.2.5 (fedora 11). Is there any reason why
>> 'git gc' does that?
>
> There is probably a reason, although a bad one for sure.
>
> Well... OK.
>
> It appears that the git installation serving clone requests for
> git://gcc.gnu.org/git/gcc.git generates lots of unreferenced objects. I
> just cloned it and the pack I was sent contains 1383356 objects (can be
> determined with 'git show-index < .git/objects/pack/*.idx | wc -l').
> However, there are only 978501 actually referenced objects in that
> cloned repository ( 'git rev-list --all --objects | wc -l').  That makes
> for 404855 useless objects in the cloned repository.
>
> Now git has a safety mechanism to _not_ delete unreferenced objects
> right away when running 'git gc'.  By default unreferenced objects are
> kept around for a period of 2 weeks.  This is to make it easy for you to
> recover accidentally deleted branches or commits, or to avoid a race
> where a just-created object in the process of being but not yet
> referenced could be deleted by a 'git gc' process running in parallel.
>
> So to give that grace period to packed but unreferenced objects, the
> repack process pushes those unreferenced objects out of the pack into
> their loose form so they can be aged and eventually pruned.  Objects
> becoming unreferenced are usually not that many though.  Having 404855
> unreferenced objects is quite a lot, and being sent those objects in the
> first place via a clone is stupid and a complete waste of network
> bandwidth.
>
> Anyone has an idea of the git version running on gcc.gnu.org?  It is
> certainly buggy and needs fixing.
>
> Anyway... To solve your problem, you simply need to run 'git gc' with
> the --prune=now argument to disable that grace period and get rid of
> those unreferenced objects right away (safe only if no other git
> activities are taking place at the same time which should be easy to
> ensure on a workstation).  The resulting .git/objects directory size
> will shrink to about 441 MB.  If the gcc.gnu.org git server was doing
> its job properly, the size of the clone transfer would also be
> significantly smaller, meaning around 414 MB instead of the current 600+
> MB.
>
> And BTW, using 'git gc --aggressive' with a later git version (or
> 'git repack -a -f -d --window=250 --depth=250') gives me a .git/objects
> directory size of 310 MB, meaning that the actual repository with all
> the trunk history is _smaller_ than the actual source checkout.  If that
> repository was properly repacked on the server, the clone data transfer
> would be 283 MB.  This is less than half the current clone transfer
> size.
>
>
> Nicolas
>

'git gc --prune=now' does work, but 'git gc --prune=now --aggressive'
(before) and 'git gc --aggressive' (after) both create very large
(>2GB; I stopped it) packs from the ~400MB-600MB packed objects. I
noted that you specifically wrote 'with a later git version' -
presumably there is a some sort of a known and fixed issue there? Just
curious.

I guess --aggressive doesn't always save space...

Hin-Tak
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]