Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Wed, 18 Oct 2006, Shawn Pearce wrote:
>
> I guess its my turn then to work in the mmap window code, huh?  :-)

There are bigger reasons to _never_ allow packs to contain deltas to 
outside of themselves:

 - there's no point. 

   If you have many small packs, you're doing something wrong. The whole 
   _point_ of packs is to put things into the same file, so that you can 
   avoid the filesystem overhead. And once packs are big and few, the 
   advantage of having deltas to outside the pack is basically zero.

 - it's a bad design. 

   Self-sufficient packs means that a pack is a "safe" thing. When the 
   index says that it contains an object, then it damn well contains it.

   In contrast, if you had packs that only contained a delta, and the pack 
   needed some _other_ pack (or loose object) to actually generate that 
   object, then it's not safe any more. You could end up with a situation 
   where you get two packs from two different sources, and they contain 
   deltas to _each_other_, and you have no way of actually generating the 
   object itself any more.

   (Or you end up having to have rules to figure out when you have a loop,
   and stop looking just in the packed files, and start looking for loose 
   objects instead)

   In other words, it has potentially _serious_ downsides.

So DAMMIT! Stop looking to make the data structures worse. The fact is, 
the git data structures are FINE. They are well-designed. They work well. 
There's no _point_ in changing them, especially since changing them seems 
to be all about making things less reliable for dubious gain.

One of the advantages of git is that you can explain things with object 
relationships, and that the file format is stable as _hell_. Thats a GOOD 
thing. Please realize that if you want to change the file formats, you'd 
have a hell of a better reason for it that "just because I can".

Please. Really.

So next time somebody suggests a new pack-format, ask yourself:

 - does it save disk-space by 50% or more?

 - does it drop memory usage by 50% or more?

 - does it improve performance by 50% of more?

 - does it make something possible that really fundamentally isn't 
   possible right now?

And if the answer to those questions is "no", then JUST DON'T DO IT.

It really needs to be _damn_ spectacular to be worthy of a new format. 
Really. We've had a few of those, so it clearly does happen:

 - The "compress _after_ SHA1". The original object format was just 
   broken, and the SHA1 name depended on how things compressed. I fixed 
   it. It needed fixing. We couldn't have done a lot of the things we did 
   without switching compression and SHA1-hashing around.

 - the pack-file in the first place: this saved orders of magnitude both 
   in diskspace _and_ performance. Not "10%". More like "factors of 100".

   THAT was worthy of a major format change.

 - the "make loose object contents look the same as packed objects". This 
   was not just a cleanup, it allows us to create pack-files much faster. 

   That said, we're still defaulting to the legacy format, and maybe it 
   wasn't really worth it. 

My personal suspicion is that we'll want to have a 64-bit index file some 
day, and THAT is worthy of a format change. That day is not now, btw. It's 
probably not even very close. Even the mozilla repo that was pushing the 
limit was only doing so until it was optimized better, and now it's 
apparently nowhere _near_ that limit.

But even then, we might well want to update _just_ the index file format.

Because in an SCM, stability and trustworthiness is more important than 
just about _anything_ else. 

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]