Re: [PATCH v3 5/5] gitdiffcore doc: mention new preliminary step for rename detection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 10, 2021 at 8:41 AM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
>
> > From: Elijah Newren <newren@xxxxxxxxx>
> >
> > The last few patches have introduced a new preliminary step when rename
> > detection is on but both break detection and copy detection are off.
> > Document this new step.  While we're at it, add a testcase that checks
> > the new behavior as well.
> >
> > Signed-off-by: Elijah Newren <newren@xxxxxxxxx>
> > ---
> >  Documentation/gitdiffcore.txt | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> >
> > diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
> > index c970d9fe438a..36ebe364d874 100644
> > --- a/Documentation/gitdiffcore.txt
> > +++ b/Documentation/gitdiffcore.txt
> > @@ -168,6 +168,23 @@ a similarity score different from the default of 50% by giving a
> >  number after the "-M" or "-C" option (e.g. "-M8" to tell it to use
> >  8/10 = 80%).
> >
> > +Note that when rename detection is on but both copy and break
> > +detection are off, rename detection adds a preliminary step that first
> > +checks if files are moved across directories while keeping their
> > +filename the same.  If there is a file added to a directory whose
> > +contents is sufficiently similar to a file with the same name that got
> > +deleted from a different directory, it will mark them as renames and
> > +exclude them from the later quadratic step (the one that pairwise
> > +compares all unmatched files to find the "best" matches, determined by
> > +the highest content similarity).  So, for example, if
> > +docs/extensions.txt and docs/config/extensions.txt have similar
> > +content, then they will be marked as a rename even if it turns out
> > +that docs/extensions.txt was more similar to src/extension-checks.c.
>
> I'd rather use docs/extensions.md instead of src/extension-checks.c;
> it would be more realistic for .md to be similar to .txt than .c.
>
> With a raised bar for this step, the equation changes a bit, no?
>
>     So, for example, if a deleted docs/ext.txt and an added
>     docs/config/ext.txt are similar enough, they will be marked as a
>     rename and prevent an added docs/ext.md that may be even similar
>     to the deleted docs/ext.txt from being considered as the rename
>     destination in the later step.  For this reason, the preliminary
>     "match same filename" step uses a bit higher threshold to mark a
>     file pair as a rename and stop considering other candidates for
>     better matches.
>
> or something?

Good points; I've updated the docs locally to reflect your
suggestions, I'll wait a bit for any other feedback and then send out
a new round with this update.

> > +At most, one comparison is done per file in this preliminary pass; so
> > +if there are several extensions.txt files throughout the directory
> > +hierarchy that were added and deleted, this preliminary step will be
> > +skipped for those files.
>
> Other than that, the whole series looked sensible to my cursory
> read.

Thanks.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux