Re: Huge performance bottleneck reading packs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 13, 2016 at 09:17:34AM +0200, Vegard Nossum wrote:

> Oops. I disabled gc a while ago; one reason I did that is that it takes
> a long time to run and it has a tendency to kick in at the worst time. I
> guess I should really put it in cron then.
> 
> I'm not sure if this is related, but I also had a problem with GitPython
> and large pack files in the past (" ValueError: Couldn't obtain fanout
> table or warning: packfile ./objects/pack/....pack cannot be accessed")

Sounds like they didn't correctly implement the extra index fanout that
happens for pack above 2G. The old Grit library had a similar bug.

> and I have pack.packSizeLimit set to 512m to fix that.
> Although the whole repo is 17G so I guess it shouldn't be necessary to
> have that many pack files.

Using packSizeLimit does "solve" that problem, but it comes with its own
set of issues. There is a very good chance that your repository would be
much smaller than 17G as a single packfile, because Git does not allow
deltas across packs, and it does not optimize the placement of objects
to keep delta-related objects in a single pack. So you'll quite often be
storing full copies of objects that could otherwise be stored as a tiny
delta.

You might want to compare the resulting size for a full repack with and
without pack.packSizeLimit.

But I agree that is not the cause of your thousand packs. They are more
likely the accumulated cruft of a thousand fetches.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]