Re: Rename detection at git log

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday 2006 November 20 10:48, Junio C Hamano wrote:

> I wrote the code and you contradict me ;-)?

Sorry; I wasn't so much contradicting that the filtering works exactly as you 
say (of course it must - I don't know anywhere near enough to make that sort 
of assertion).

However, I do think that the problem is not one of filtering.  I was saying 
that "-C" has no practical use.

> in your example, it would give you the creation of fileB, not
> copy.

I'm sure it would - but you had to use --find-copies-harder; -C would not find 
it as a copy.

>  - Renames are only picked up from files that were lost in the
>    same change (i.e. "mv fileA fileB" creates fileB and loses
>    fileA; fileB is checked if it is similar to fileA in the
>    original).

I've found rename detection to be flawless in all my uses.

>  - Copies are only picked up from files that were changed in the
>    same change (i.e. splitting major part of original file and
>    moving it to somewhere else, while leaving a skelton in the
>    original file).  "harder" is needed if the copy original was
>    untouched, as you found out.

Yep; I understand that.  I also understand that it is done for performance 
reasons.  However, since the typical copy will be one where the source 
doesn't change at the same time, I am arguing that the non-hard copy 
detection isn't much use.

> The last one is a compromise between performance and thoroughness,
> and the "harder" is one knob to tweak its behaviour.

I've been poking in tree-diff.c to see if I can understand why it it such a 
performance hog.  I still haven't.  Each file is stored under its hash right?  
So for copy detection why can't you just search for other files with the same 
hash, which I presume is very fast (as it is the basis of what makes git so 
fast)?

I am probably misunderstanding git, but I guess that a copy isn't even needed 
in the database because two files with the same hash in the working copy only 
need storing once and then referencing twice.  So for a copy (again, with my 
simple understanding of git) we'd have:

 commit1 -> tree1 -> fileA = fileA_hash
    ^
    |
 commit2 -> tree2 -> fileA = fileA_hash
                     fileB = fileB_hash

Doesn't that mean that copy detection is just a matter of searching the parent 
commit trees for references to the same hash?


Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE
andyparkins@xxxxxxxxx
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]