On Wed, 18 Oct 2006, Shawn Pearce wrote: > > I guess its my turn then to work in the mmap window code, huh? :-) There are bigger reasons to _never_ allow packs to contain deltas to outside of themselves: - there's no point. If you have many small packs, you're doing something wrong. The whole _point_ of packs is to put things into the same file, so that you can avoid the filesystem overhead. And once packs are big and few, the advantage of having deltas to outside the pack is basically zero. - it's a bad design. Self-sufficient packs means that a pack is a "safe" thing. When the index says that it contains an object, then it damn well contains it. In contrast, if you had packs that only contained a delta, and the pack needed some _other_ pack (or loose object) to actually generate that object, then it's not safe any more. You could end up with a situation where you get two packs from two different sources, and they contain deltas to _each_other_, and you have no way of actually generating the object itself any more. (Or you end up having to have rules to figure out when you have a loop, and stop looking just in the packed files, and start looking for loose objects instead) In other words, it has potentially _serious_ downsides. So DAMMIT! Stop looking to make the data structures worse. The fact is, the git data structures are FINE. They are well-designed. They work well. There's no _point_ in changing them, especially since changing them seems to be all about making things less reliable for dubious gain. One of the advantages of git is that you can explain things with object relationships, and that the file format is stable as _hell_. Thats a GOOD thing. Please realize that if you want to change the file formats, you'd have a hell of a better reason for it that "just because I can". Please. Really. So next time somebody suggests a new pack-format, ask yourself: - does it save disk-space by 50% or more? - does it drop memory usage by 50% or more? - does it improve performance by 50% of more? - does it make something possible that really fundamentally isn't possible right now? And if the answer to those questions is "no", then JUST DON'T DO IT. It really needs to be _damn_ spectacular to be worthy of a new format. Really. We've had a few of those, so it clearly does happen: - The "compress _after_ SHA1". The original object format was just broken, and the SHA1 name depended on how things compressed. I fixed it. It needed fixing. We couldn't have done a lot of the things we did without switching compression and SHA1-hashing around. - the pack-file in the first place: this saved orders of magnitude both in diskspace _and_ performance. Not "10%". More like "factors of 100". THAT was worthy of a major format change. - the "make loose object contents look the same as packed objects". This was not just a cleanup, it allows us to create pack-files much faster. That said, we're still defaulting to the legacy format, and maybe it wasn't really worth it. My personal suspicion is that we'll want to have a 64-bit index file some day, and THAT is worthy of a format change. That day is not now, btw. It's probably not even very close. Even the mozilla repo that was pushing the limit was only doing so until it was optimized better, and now it's apparently nowhere _near_ that limit. But even then, we might well want to update _just_ the index file format. Because in an SCM, stability and trustworthiness is more important than just about _anything_ else. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html