On Fri, Sep 26, 2008 at 12:39:54AM +0200, Andreas Ericsson wrote: > Yann Dirson wrote: >> " try unchanged files as candidate for copy detection.\n" \ >> +" --factorize_renames\n" \ >> +" factorize renames of all files of a directory.\n" \ > > > Use dashes to separate words in arguments, please. Fortunately, the typo only made it to the help string, thanks for catching it :) >> +#include <libgen.h> >> + > > +#ifdef basename > +# undef basename > +#endif > > We might as well use the GNU version of basename() at least. Even if > you don't use it, I'd rather not see this bite some unwary programmer > coming along after you. Worst case scenario, sha1's won't add up if > POSIX basename alters its argument, making for one fiendishly tricky > bug to find. I'll look into that, thanks >> +/* >> + * FIXME: we could optimize the 100%-rename case by preventing >> + * recursion to unfold what we know we would refold here. >> + * FIXME: do we want to replace linked list with sorted array ? > > Either that or a hash. Most of the time seems to be spent in lookups. > With a hash you get quick lookups and reasonably quick inserts. With > a sorted array you get lower memory footprint than anything else and > can bisect your way to the right entry, which performs reasonably > close to skiplists. The sort overhead might be a deterrant factor, > but insofar as I understand it trees are always sorted in git anyway, > so perhaps that'd be the best solution. Thanks for this insight - I'll wait before changing the data structure, since the current implementation may need to be able to recurse (consider renamed subdirs as well as renamed files). > Apart from that, I'd need to apply the patch to review it properly, > and I'm far too tired for that now. > > I like the direction this is going though, so thanks a lot for doing > it :) Great :) Best regards, -- Yann -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html