On Mon, Apr 02, 2012 at 09:51:21AM -0700, Shawn O. Pearce wrote: > Probably. But we tend to hate caches in Git because they can get stale > and need to be rebuilt, and are redundant with the base data. The > mythical "pack v4" work was going to approach this problem by storing > the commit timestamps uncompressed in a more machine friendly format. > Unfortunately the work has been stalled for years. I'd love for packv4 to exist, but even once it does, it comes with its own complications for network transfer (since we will have to translate to/from packv2 on the wire). Has anyone looked seriously at a new index format that stores the redundant information in a more easily accessible way? It would increase our disk usage, but for something like linux-2.6, only by 10MB per 32-bit word. On most of my systems I would gladly spare some extra RAM for the disk cache if it meant I could avoid inflating a bunch of objects. And this could easily be made optional for systems that don't want to make the tradeoff (if it's not there, you fall back to the current procedure; we could even store the data in a separate file to retain indexv2 compatibility). So it's sort-of a cache, in that it's redundant with the actual data. But staleness and writing issues are a lot simpler, since it only gets updated when we index the pack (and the pack index in general is a similar concept; we are "caching" the location of the object in the packfile, rather than doing a linear search to look it up each time). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html