Re: [PATCH v3 5/5] gitdiffcore doc: mention new preliminary step for rename detection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Elijah Newren via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Elijah Newren <newren@xxxxxxxxx>
>
> The last few patches have introduced a new preliminary step when rename
> detection is on but both break detection and copy detection are off.
> Document this new step.  While we're at it, add a testcase that checks
> the new behavior as well.
>
> Signed-off-by: Elijah Newren <newren@xxxxxxxxx>
> ---
>  Documentation/gitdiffcore.txt | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
> index c970d9fe438a..36ebe364d874 100644
> --- a/Documentation/gitdiffcore.txt
> +++ b/Documentation/gitdiffcore.txt
> @@ -168,6 +168,23 @@ a similarity score different from the default of 50% by giving a
>  number after the "-M" or "-C" option (e.g. "-M8" to tell it to use
>  8/10 = 80%).
>  
> +Note that when rename detection is on but both copy and break
> +detection are off, rename detection adds a preliminary step that first
> +checks if files are moved across directories while keeping their
> +filename the same.  If there is a file added to a directory whose
> +contents is sufficiently similar to a file with the same name that got
> +deleted from a different directory, it will mark them as renames and
> +exclude them from the later quadratic step (the one that pairwise
> +compares all unmatched files to find the "best" matches, determined by
> +the highest content similarity).  So, for example, if
> +docs/extensions.txt and docs/config/extensions.txt have similar
> +content, then they will be marked as a rename even if it turns out
> +that docs/extensions.txt was more similar to src/extension-checks.c.

I'd rather use docs/extensions.md instead of src/extension-checks.c;
it would be more realistic for .md to be similar to .txt than .c.

With a raised bar for this step, the equation changes a bit, no?  

    So, for example, if a deleted docs/ext.txt and an added
    docs/config/ext.txt are similar enough, they will be marked as a
    rename and prevent an added docs/ext.md that may be even similar
    to the deleted docs/ext.txt from being considered as the rename
    destination in the later step.  For this reason, the preliminary
    "match same filename" step uses a bit higher threshold to mark a
    file pair as a rename and stop considering other candidates for
    better matches.

or something?

> +At most, one comparison is done per file in this preliminary pass; so
> +if there are several extensions.txt files throughout the directory
> +hierarchy that were added and deleted, this preliminary step will be
> +skipped for those files.

Other than that, the whole series looked sensible to my cursory
read.

Thanks.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux