On Sun, 18 Jun 2006, Linus Torvalds wrote: > > The "refs" field, which is really needed only for fsck, is maintained in > a separate hashed lookup-table, allowing all normal users to totally > ignore it. Btw, in case people wondered: the cost to git-fsck-objects seems to be zero and sometimes apparently even negative. In order to remove "refs" from "struct object", I had to add it to the object_refs structure instead, and so you'd think that the memory usage for git-fsck-objects (which needs the object refs) should be unchanged, while the hashed lookup should be more expensive than just the direct pointer lookup. Actually testing it, though, implies that isn't the case. Lots of objects (every single blob object, in fact) have no refs at all, and for that case we don't create any "object_refs" structure at all, so we don't actually end up with a 1:1 relationship, and we win a small amount of memory. And the hashing seems to be effective enough that it's no costlier than looking up the ref pointer directly from the object. There's probably some bad cache behaviour from the hashing, but it didn't show up in the benchmarking I did (ie fsck took as long before as it did afterwards, both for git and for the kernel archive). It may be (probably is) that the reachability analysis is just a very small portion of the overall costs, and it's just not very noticeable. It may also be that whatever bad cache behaviours you get from the extra hash lookup are just balanced out by the objects themselves being slightly denser and better in the cache (although that is probably partly hidden again by the extra malloc padding). Regardless, there doesn't really seem to be any downsides, but I didn't test it _that_ exhaustively. Linus - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html