This is the cleaned-up version of the commit caching patches I mentioned here: http://article.gmane.org/gmane.comp.version-control.git/212329 The basic idea is to generate a cache file that sits alongside a packfile and contains the timestamp, tree, and parents in a more compact and easy-to-access format. The timings from this one are roughly similar to what I posted earlier. Unlike the earlier version, this one keeps the data for a single commit together for better cache locality (though I don't think it made a big difference in my tests, since my cold-cache timing test ends up touching every commit anyway). The short of it is that for an extra 31M of disk space (~4%), I get a warm-cache speedup for "git rev-list --all" of ~4.2s to ~0.66s. The big thing it does not (yet) do is use offsets to reference sha1s, as Shawn suggested. This would potentially drop the on-disk size from 84 bytes to 16 bytes per commit (or about 6M total for linux.git). Coupled with using compression level 0 for trees (which do not compress well at all, and yield only a 2% increase in size when left uncompressed), my "git rev-list --objects --all" time drops from ~40s to ~25s. Perf reveals that we're spending most of the remaining time in lookup_object. I've spent a fair bit of time trying to optimize that, but with no luck; I think it's fairly close to optimal. The problem is just that we call it a very large number of times, since it is the mechanism by which we recognize that we have already processed each sha1. [1/6]: csum-file: make sha1write const-correct [2/6]: strbuf: add string-chomping functions [3/6]: introduce pack metadata cache files [4/6]: introduce a commit metapack [5/6]: add git-metapack command [6/6]: commit: look up commit info in metapack -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html