Linus Torvalds wrote: > On Sun, 26 Mar 2006, Jakub Narebski wrote: >> >> If (2) is common enough then discussed improvements to rename detection, >> namely comparing basenames as a base for candidate selection is a good >> idea. > > BK had this "renametool" which got started automatically when you applied > a patch that removed one or more files and added one or more files, so > that you could then pair up the files manually. [...] > The thing is, the fast rename detection that is in the "next" branch > really does a lot better, and it's fast enough. I was thinking about the fast ename detection algorithm in "next" branch. That is the question if recording additional (helper) information about contents copying and moving like the mentioned "renametool" did is worth the effort, both in coding it and from user's point of view. Or would better contents copying and moving detection ("renames detection") for whatchanged and similar suffice. I am of opinion that voluntary information about contents moving and copying in the commits would help. Purposes: 1.) Record contents moving and similarity information which cannot or cannot be easily calculated; see Paul Jakma response in this thread MessageID: <Pine.LNX.4.64.0603270642090.5276@xxxxxxxxxxxxxxx> for example copying fragment of code, small fragment of the whole file, creating documentation or header file from code, or code skeleton from template, or rewrite of code in different language (e.g. shell script to perl, script to compiled code e.g. Perl or Python to C). 2.) Caching the results of similarity algorithm/rename detection tool (also Paul Jakma post), including remembering false positives and undetected renames, for efficiency. Calculated automatically parts might be throw-away. Sources of information: 1.) Manually entered information *at commit*, including *-rm, *-mv, *-cp like commands (which nobody likes) and systematized (pseudolanguage?) for copying and moving contents in the log messages. 2.) Semi-manual tools like the mentioned "renametool" of BK. 3.) Support from editor (remebering where copied and pasted, or cut and pasted fragment came from, and providing prefilled command to record contents moving ("renames") or prefilled commit log containing this information. Hard to get, probably most useful. 4.) Information from resolved merges and results of diagnosis (pickaxe like) tools, especially recording "renames" which were not detected, and removing "renames" which were detected falsily. Is that the place where I should provide code (patch) for testing the idea :) ? >> I wonder how common is (2) compared to (1)+(2) i.e. move to other dir >> and rename, old-dir/old-file.c to new-dir/new-subdir/new-file.c > > For example, one common case was a directory structure like > > .. > type-file1.c > type-file2.c > otherfiles.c > yet-more.c > .. > > being split up into a subdirectory > > .. > type/file1.c > type/file2.c > otherfiles.c > yet-more.c > .. > > (eg drivers/scsi/aic7xx-* being given a subdirectory of it's own, as > drivers/scsi/aic7xx/*). So the basename wouldn't stay the same, because it > contained some piece of data that became redundant with the move. Perhaps fast rename detection algorithm needs some smart similarity estimate for names, which would put more weight in the parts closer to basename, and would detect */type-file1.c and */type/file1.c as similar. -- Jakub Narebski Warsaw, Poland - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html