Re: [PATCH v2 4/4] gitdiffcore doc: mention new preliminary step for rename detection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 9, 2021 at 9:03 AM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Derrick Stolee <stolee@xxxxxxxxx> writes:
>
> >> +Note that when rename detection is on but both copy and break
> >> +detection are off, rename detection adds a preliminary step that first
> >> +checks files with the same basename.  If files with the same basename
> >
> > I find myself wanting a definition of 'basename' here, but perhaps I'm
> > just being pedantic. A quick search clarifies this as a standard term [1]
> > of which I was just ignorant.
> >
> > [1] https://man7.org/linux/man-pages/man3/basename.3.html
> >
> >> +are sufficiently similar, it will mark them as renames and exclude
> >> +them from the later quadratic step (the one that pairwise compares all
> >> +unmatched files to find the "best" matches, determined by the highest
> >> +content similarity).
>
> While I do not think `basename` is unacceptably bad, we should aim
> to do better.  For "direc/tory/hello.txt", both "hello.txt" or
> "hello" are what would come up to people's mind with the technical
> term "basename" (i.e. basename as opposed to dirname, vs basename as
> opposed to filename with .extension).
>
> Avoiding this ambiguity and using a word understandable by those not
> versed well with UNIX/POSIX lingo may be done at the same time,
> hopefully.
>
> For example, can we frame the description around this key sentence:
>
>     The heuristics is based on an observation that a file is often
>     moved across directories while keeping its filename the same.
>
> The term "filename" alone can be ambiguous (i.e. both "hello.txt"
> and "direc/tory/hello.txt" are valid interpretations in the earlier
> example), but in the context of a sentence that talks about "moved
> across directories", the former would become the only valid one.  We
> can even say just "name" and there is no ambiguity in the above "key
> sentence".
>
> Then keeping that in mind, we can rewrite the above you quoted like
> so without going technical and without risking ambiguity, like this:
>
>     ... a preliminary step that checks if files are moved across
>     directories while keeping their filenames the same.  If there is
>     a file added to a directory whose contents is sufficiently
>     similar to a file with the same name that got deleted from a
>     different directory, ...

Nice, I like it!



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux