[resent because of malformed headers causing rejection] On May 24, 2007, at 03:12, Shawn O. Pearce wrote:
I still don't buy the idea that these megablobs shouldn't be packed. I understand Dana's pain here (at least a little bit, my problems aren't as bad as his are), but I also hate to see us run away from packfiles for these really sick cases just because we have some issues in our current packfile handling. Packfiles give us a lot of benefits: 1) less inode usage;
Using 1 inode per huge blob can never be an issue
2) transport can write directly to local disk; 3) transport can (quickly) copy from local disk;
Can do these by re-enabling the new loose object format
4) testing for existance is *much* faster; 5) deltafication is possible;
Look at it the other way. If we have huge objects (say >1GB), we should put them in a pack of their own anyway. What's better: having a pack with a separate index file or just a loose object? While the one object per file model is awful for many small files with lots of similarity, it is really quite efficient for large objects, and the most reasonable model for huge objects. Such blobs are just too large to do anything useful with. The only operations done on them will be to check them in or check them out. Ideally, we should never try to have them in memory at all, but just stream them to/from disk while compressing or decompressing. Trying to deltify huge objects just takes too much time. Similarly, we don't want to read 100MB to then apply a delta and maybe throw out half of the data we read in the first place. It's just too inefficient. If we'd even read the huge blobs once during "git repack", we'll waste so much time that we're unlikely to ever gain it back in any real world scenario. -Geert - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html