On Thu, Aug 02, 2012 at 06:41:55PM -0400, Jeff King wrote: > > (1a) is good regardless rename overrides. Why don't you polish and > > submit it? We can set some criteria to limit the cache size while > > keeping computation reasonably low. Caching rename scores for file > > pairs that has file size larger than a limit is one. Rename matrix > > size could also be a candidate. We could even cache just rename scores > > for recent commits (i.e. close to heads) only with the assumption that > > people diff/apply recent commits more often. > > I'll polish and share it. I'm still not 100% sure it's a good idea, > because introducing an on-disk cache means we need to _manage_ that > cache. How big will it be? Who will prune it when it gets too big? By > what criteria? And so on. > > But if it's all hidden behind a config option, then it won't hurt people > who don't use it. And people who do use it can gather data on how the > caches grow. Here it is, all polished up. I'm still a little lukewarm on it for two reasons: 1. The whole idea. For the reasons above, I'm a little iffy on doing this cache at all. It does yield speedups, but only in some specific cases. So it's hidden behind a diff.renamecaches option and off by default. 2. The implementation is a little...gross. Long ago, I had written a type-generic map class for git using void pointers. It ended up complex and had problems with unaligned accesses. So I rewrote it using preprocessor macro expansion (e.g., you'd call IMPLEMENT_MAP(foo, const char *, int) or similar). But that wasn't quite powerful enough, as I really want conditional compilation inside the macro expansion, but you can't #ifdef. So I really wanted some kind of code generation that could do conditionals. Which you can do with the C preprocessor, but rather than expanding macros, you have to #include templates that expand based on parameters you've set. Which is kind of ugly and non-intuitive, but it does work. Look at patch 1 to see what I mean. Also, this sort of pre-processor hackery to create type-generic data structures is the first step on the road that eventually led to C++ being developed. And that scares me a little. So yeah. Here it is. I'm not sure yet if it's a good idea or not. [1/8]: implement generic key/value map Infrastructure. [2/8]: map: add helper functions for objects as keys [3/8]: fast-export: use object to uint32 map instead of "decorate" [4/8]: decorate: use "map" for the underlying implementation These ones are optional for this series, but since we are introducing the infrastructure anyway (which is really just a generalized form of what "decorate" does), it offsets the code bloat. [5/8]: map: implement persistent maps [6/8]: implement metadata cache subsystem More infrastructure. [7/8]: implement rename cache [8/8]: diff: optionally use rename cache And these are the actual rename cache. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html