Re: git packs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Sat, 10 Nov 2007, bob wrote:
> 
> The reason that I ask is that I have been playing different
> scenarios using git 1.5.3.5 under MacOSX 10.4.10 mostly
> all day and every time that
> 
> A) 	a file approaches or exceeds 2gig on an 'add', it
> 	results in:
> 		fatal: Out of memory? mmap failed: Cannot allocate memory

Git wants to handle single files as one single entity, so single big files 
really do end up being very painful. The costs of compressing them and 
generating deltas would probably get prohibitively high *anyway*, but it 
does mean that if you have gigabyte files, you do want a 64-bit VM.

I thought OS X could do 64 bits these days. Maybe not.

Anyway, that explains the "cannot allocate memory". Git simply wants to 
mmap the whole file. You don't have enough VM space for it.

(And if you seriously want to work with multi-gigabyte files, git probbaly 
isn't going to perform wonderfully well, even if it *should* work fine if 
you just have a full 64-bit environment that allows the mmap).

> B) 	the repository size less the .git subdirectory approaches
> 	4gig on a 'fetch' it results in:
> 
> 	Resolving 3356 deltas...
> 	fatal: serious inflate inconsistency: -3 (unknown compression method)

That sounds really broken. I'm not seeing what would cause that, apart 
from some really bad data corruption and/or broken zlib implementation. 
But if the pack-file really is 2GB+ in size, I could imagine some sign 
issues cropping up.

git will generally use "unsigned long" (which is probably just 32-bit on 
your setup), but since git in those circumstances would be limited by the 
size of the VM _anyway_, that's not really much of a limitation (although 
probably broken on the crazy Windows "LLP64" model). But maybe we have 
some place where we use a signed thing, or zlib does, and I could see that 
causing breakage.

But that code-sequence really should never even come *close* to the 31-bit 
limit, as long as the individual objects themselves aren't bigger than the 
available VM space (and git currently assumes "unsigned long" is 
sufficiently big to cover the VM space, which is not technically correct, 
but should be fine on OS X too).

That said, we should use "off_t" in that function. I suspect we have a 
number of people (read: me) who have grown too used to living in a 64-bit 
world..

> I have been testing on my laptop which has a 32-bit Intel Core Duo.

Ok, so you're 32-bit limited even if there is were to be some 64-bit 
support for OS X.

> Also, I have run the same tests on a dual quad-core Intel processor
> which is 64 bit, (but not sure that Apple uses the 64 bits in 10.4.10).  I
> get the same results as above.

I'm pretty sure OS X defaults to a 32-bit environment, but has at least 
*some* 64-bit support. It would definitely need to be enabled explicitly 
(since they made the *insane* decision to move over to Intel laptop chips 
six months before they got 64-bit support! Somebody at Apple is a total 
idiot, and should get fired).

So it would be interesting to hear if a 64-bit build would make a 
difference.

> The zlib is at the latest revision of 1.2.3 and gcc is at 4.0.1
> which from what I can tell supports large files, because 'off_t' is 8 bytes
> which is the size used for a 'stat' file size.

See above: single files are size-limited, but with a large off_t like 
yours, you should be fine. Except we may have screwed up.

> I am just wondering if these size limitations exist for MacOSX
> or maybe I am doing something wrong (which is probably
> the case).

We *have* had issues with broken implementations of "pread()" on some 
systems.  

You could try setting NO_PREAD in the Makefile and compiling with the 
compatibility function.. That's the only thing that comes to mind as being 
worth trying in that area.

And if you have some script to generate the repository (ie you aren't 
using "live data", but are testing the limits of the system), if you can 
make that available, so that people with non-OSX environments can test, 
that would be interesting..

I certainly have some 32-bit environments too (old linux boxes), but I'm 
too lazy to write a test-case, so I was hoping you'd be using some simple 
scripts that I could just test and see if I can see the behaviour you 
describe myself.

That said, I have worked with a 3GB pack-file (one of the KDE trial 
repos). That worked fine. But git does tend to want a *lot* of memory for 
really big repositories, so I suspect that if you actually work with 2GB+ 
pack-files, you'll be wanting a 64-bit environment just because you'll be 
wanting more than 2GB of physical RAM in order to be able to access it 
efficiently.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux