Re: [PATCH v2 4/4] gitdiffcore doc: mention new preliminary step for rename detection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Derrick Stolee <stolee@xxxxxxxxx> writes:

>> +Note that when rename detection is on but both copy and break
>> +detection are off, rename detection adds a preliminary step that first
>> +checks files with the same basename.  If files with the same basename
>
> I find myself wanting a definition of 'basename' here, but perhaps I'm
> just being pedantic. A quick search clarifies this as a standard term [1]
> of which I was just ignorant.
>
> [1] https://man7.org/linux/man-pages/man3/basename.3.html
>
>> +are sufficiently similar, it will mark them as renames and exclude
>> +them from the later quadratic step (the one that pairwise compares all
>> +unmatched files to find the "best" matches, determined by the highest
>> +content similarity).

While I do not think `basename` is unacceptably bad, we should aim
to do better.  For "direc/tory/hello.txt", both "hello.txt" or
"hello" are what would come up to people's mind with the technical
term "basename" (i.e. basename as opposed to dirname, vs basename as
opposed to filename with .extension).

Avoiding this ambiguity and using a word understandable by those not
versed well with UNIX/POSIX lingo may be done at the same time,
hopefully.

For example, can we frame the description around this key sentence:

    The heuristics is based on an observation that a file is often
    moved across directories while keeping its filename the same.

The term "filename" alone can be ambiguous (i.e. both "hello.txt"
and "direc/tory/hello.txt" are valid interpretations in the earlier
example), but in the context of a sentence that talks about "moved
across directories", the former would become the only valid one.  We
can even say just "name" and there is no ambiguity in the above "key
sentence".

Then keeping that in mind, we can rewrite the above you quoted like
so without going technical and without risking ambiguity, like this:

    ... a preliminary step that checks if files are moved across
    directories while keeping their filenames the same.  If there is
    a file added to a directory whose contents is sufficiently
    similar to a file with the same name that got deleted from a
    different directory, ...




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux