Re: [PATCH] Prevent megablobs from gunking up git packs

Geert Bosch <bosch@xxxxxxxxxxx> · Thu, 24 May 2007 16:43:52 -0400

[resent because of malformed headers causing rejection]
On May 24, 2007, at 03:12, Shawn O. Pearce wrote:
I still don't buy the idea that these megablobs shouldn't be packed.
I understand Dana's pain here (at least a little bit, my problems
aren't as bad as his are), but I also hate to see us run away from
packfiles for these really sick cases just because we have some
issues in our current packfile handling.

Packfiles give us a lot of benefits:

 1) less inode usage;
Using 1 inode per huge blob can never be an issue
 2) transport can write directly to local disk;
 3) transport can (quickly) copy from local disk;
Can do these by re-enabling the new loose object format
 4) testing for existance is *much* faster;
 5) deltafication is possible;

Look at it the other way. If we have huge objects (say >1GB),
we should put them in a pack of their own anyway. What's better:
having a pack with a separate index file or just a loose object?
While the one object per file model is awful for many small files
with lots of similarity, it is really quite efficient for large
objects, and the most reasonable model for huge objects.

Such blobs are just too large to do anything useful with.
The only operations done on them will be to check them in
or check them out. Ideally, we should never try to have them
in memory at all, but just stream them to/from disk while
compressing or decompressing.

Trying to deltify huge objects just takes too much time.
Similarly, we don't want to read 100MB to then apply a delta
and maybe throw out half of the data we read in the first place.
It's just too inefficient. If we'd even read the huge blobs once
during "git repack", we'll waste so much time that we're unlikely
to ever gain it back in any real world scenario.

  -Geert
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html