Re: Basename matching during rename/copy detection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Shawn O. Pearce" <spearce@xxxxxxxxxxx> writes:

> So Govind Salinas has found an interesting case in the rename
> detection code:
>
>   $ git clone git://repo.or.cz/Widgit.git
>   $ git diff -M --raw -r 192e^ 192e | grep .resx
>   :100755 000000 4c8ab79... 0000000... D  Form1.resx
>   :100755 100755 9e70146... 9e70146... R100       CommitViewer.resx       UI/CommitViewer.resx
>   :100755 100755 90929fd... b40ff98... C091       RepoManager.resx        UI/Form1.resx
>   :100755 100755 90929fd... 90929fd... C100       PreferencesEditor.resx  UI/PreferencesEditor.resx
>   :100755 100755 90929fd... 90929fd... R100       PreferencesEditor.resx  UI/RepoManager.resx
>   :100755 100755 90929fd... 8535007... R097       RepoManager.resx        UI/RepoTreeView.resx
>
> In this case several files had identical old images, and some
> kept that old image during the rename.  Unfortunately because of
> the ordering of the files in the tree Git has decided to "rename"
> the PreferencesEditor.resx file to UI/RepoManager.resx, rather than
> renaming RepoManager.resx to UI/RepoManager.resx.  Go Git.
>
> I'm wondering if we shouldn't play the game of trying to match
> delete/add pairs up by not only similarity, but also by path
> basename.  In the case above its exactly what Govind thought should
> happen; he moved the file from one directory to another, and didn't
> even change its content during the move.  But Git decided "better"
> to use a totally different file in the "rename".

Actually, git did not decide anything, and certainly not better.

Having many "identical files" in the preimage is just stupid to
begin with (if you know they are identical, why are you storing
copies, instead of your build procedure to reuse the same file),
so the algorithm did not bother finding a better match among
"equals".

I am not opposed to a patch that says "Ok, these two preimages
have identical similarity score, *AND* indeed the preimages have
the same contents --- we tiebreak them with other heuristics to
help stupid projects better".  And I can see basename similarity
one of the useful heuristics you could use.




-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux