Derrick Stolee <stolee@xxxxxxxxx> writes: >> +Note that when rename detection is on but both copy and break >> +detection are off, rename detection adds a preliminary step that first >> +checks files with the same basename. If files with the same basename > > I find myself wanting a definition of 'basename' here, but perhaps I'm > just being pedantic. A quick search clarifies this as a standard term [1] > of which I was just ignorant. > > [1] https://man7.org/linux/man-pages/man3/basename.3.html > >> +are sufficiently similar, it will mark them as renames and exclude >> +them from the later quadratic step (the one that pairwise compares all >> +unmatched files to find the "best" matches, determined by the highest >> +content similarity). While I do not think `basename` is unacceptably bad, we should aim to do better. For "direc/tory/hello.txt", both "hello.txt" or "hello" are what would come up to people's mind with the technical term "basename" (i.e. basename as opposed to dirname, vs basename as opposed to filename with .extension). Avoiding this ambiguity and using a word understandable by those not versed well with UNIX/POSIX lingo may be done at the same time, hopefully. For example, can we frame the description around this key sentence: The heuristics is based on an observation that a file is often moved across directories while keeping its filename the same. The term "filename" alone can be ambiguous (i.e. both "hello.txt" and "direc/tory/hello.txt" are valid interpretations in the earlier example), but in the context of a sentence that talks about "moved across directories", the former would become the only valid one. We can even say just "name" and there is no ambiguity in the above "key sentence". Then keeping that in mind, we can rewrite the above you quoted like so without going technical and without risking ambiguity, like this: ... a preliminary step that checks if files are moved across directories while keeping their filenames the same. If there is a file added to a directory whose contents is sufficiently similar to a file with the same name that got deleted from a different directory, ...