On Monday 2006 November 20 10:48, Junio C Hamano wrote: > I wrote the code and you contradict me ;-)? Sorry; I wasn't so much contradicting that the filtering works exactly as you say (of course it must - I don't know anywhere near enough to make that sort of assertion). However, I do think that the problem is not one of filtering. I was saying that "-C" has no practical use. > in your example, it would give you the creation of fileB, not > copy. I'm sure it would - but you had to use --find-copies-harder; -C would not find it as a copy. > - Renames are only picked up from files that were lost in the > same change (i.e. "mv fileA fileB" creates fileB and loses > fileA; fileB is checked if it is similar to fileA in the > original). I've found rename detection to be flawless in all my uses. > - Copies are only picked up from files that were changed in the > same change (i.e. splitting major part of original file and > moving it to somewhere else, while leaving a skelton in the > original file). "harder" is needed if the copy original was > untouched, as you found out. Yep; I understand that. I also understand that it is done for performance reasons. However, since the typical copy will be one where the source doesn't change at the same time, I am arguing that the non-hard copy detection isn't much use. > The last one is a compromise between performance and thoroughness, > and the "harder" is one knob to tweak its behaviour. I've been poking in tree-diff.c to see if I can understand why it it such a performance hog. I still haven't. Each file is stored under its hash right? So for copy detection why can't you just search for other files with the same hash, which I presume is very fast (as it is the basis of what makes git so fast)? I am probably misunderstanding git, but I guess that a copy isn't even needed in the database because two files with the same hash in the working copy only need storing once and then referencing twice. So for a copy (again, with my simple understanding of git) we'd have: commit1 -> tree1 -> fileA = fileA_hash ^ | commit2 -> tree2 -> fileA = fileA_hash fileB = fileB_hash Doesn't that mean that copy detection is just a matter of searching the parent commit trees for references to the same hash? Andy -- Dr Andy Parkins, M Eng (hons), MIEE andyparkins@xxxxxxxxx - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html