Re: Problem with large files on different OSes

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Wed, 27 May 2009 09:59:21 -0700 (PDT)

On Wed, 27 May 2009, Linus Torvalds wrote:
> 
> I'll see if I can make us handle the "big file without diff" case better 
> by chunking.

Hmm. No. Looking at it some more, we could add some nasty code to do 
_some_ things chunked (like adding a new file as a single object), but it 
doesn't really help. For any kind of useful thing, we'd need to handle the 
"read from pack" case in multiple chunks too, and that gets really nasty 
really quickly.

The whole "each object as one allocation" design is pretty core, and it 
looks pointless to have a few special cases, when any actual relevant use 
would need a whole lot more than the few simple ones.

Git really doesn't like big individual objects.

I've occasionally thought about handling big files as multiple big 
objects: we'd split them into a "pseudo-directory" (it would have some new 
object ID), and then treat them as a magical special kind of directory 
that just happens to be represented as one large file on the filesystem.

That would mean that if you have a huge file, git internally would never 
think of it as one big file, but as a collection of many smaller objects. 
By just making the point where you break up files be a consistent rule 
("always break into 256MB pieces"), it would be a well-behaved design (ie 
things like behaviour convergence wrt the same big file being created 
different ways).

HOWEVER.

While that would fit in the git design (ie it would be just a fairly 
straightforward extension - another level of indirection, kind of the way 
we added subprojects), it would still be a rewrite of some core stuff. The 
actual number of lines might not be too horrid, but quite frankly, I 
wouldn't want to do it personally. It would be a lot of work with lots of 
careful special case handling - and no real upside for normal use.

So I'm kind of down on it. I would suggest just admitting that git isn't 
very good at big individual files - especially not if you have a limited 
address space.

So "don't do it then" or "make sure you are 64-bit and have lots of 
memory if you do it" may well be the right solution.

[ And it's really really sad how Apple migrated to x86-32. It was totally 
  unforgivably stupid, and I said so at the time. When Apple did the 
  PowerPC -> x86 transition, they should have just transitioned to x86-64, 
  and never had a 32-bit space.

  But Apple does stupid things, that seem to be driven by marketing rather 
  than thinking deeply about the technology, and now they basically _have_ 
  to default to that 32-bit environment. ]

Oh well. 

			Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html