David Lang wrote: > On Mon, 27 Mar 2006, Jakub Narebski wrote: > >> 2.) Caching the results of similarity algorithm/rename detection tool >> (also Paul Jakma post), including remembering false positives and >> undetected renames, for efficiency. Calculated automatically parts might >> be throw-away. > > this sounds like it could easily devolve into a O(n!) situation where you > are cacheing how everything is related (or not related) to everything > else. Paul was makeing the point that the purpose was to cache the data to > eliminate the time needed to calculate it, but if you don't store all the > results then you don't know if the result is not relavent, or unknown, so > you need to calculate it again. First of all, you only remember non-trivial relations (i.e. file.c is always related to file.c). If the cache would be only for commits, it would be O(c*p*n), where c is number of commits, p is percentage of contents moving ("renames") times percent of files changed in the commit, and n is the number of files, probably O(c) practically. Even if we remember for all (tree1,tree2) pairs it would be O(c^2). Additionally cache can be limited in size (pruning oldest cache). -- Jakub Narebski Warsaw, Poland - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html