Re: icase pathspec magic support in ls-tree

Erik Cervin Edin <erik@xxxxxxxxxxx> · Fri, 14 Oct 2022 14:00:07 +0200

On Fri, Oct 14, 2022 at 10:58 AM Tao Klerks <tao@xxxxxxxxxx> wrote:
>
> I don't understand this suggestion; doesn't it only catch duplicates
> where both instances were introduced in the same 100-commit range?

Yes. It was a bit half-baked but the main idea was to limit the tree
to a smaller subset (and not the whole tree) and incrementally
checking for introduced duplicates instead of a full tree search. I
think that's basically Elijah's idea. Get all (added?) files
introduced in a certain revision range (last change, since yesterday
etc.) and then only check those against the tree for duplicates in a
manner of how you define duplicates

On Fri, Oct 14, 2022 at 10:50 AM Tao Klerks <tao@xxxxxxxxxx> wrote:
>
> Directories have been the problem, in "my" repo, around one-third of
> the time - typically someone does a directory rename, and someone else
> does a bad merge and reintroduces the old directory.

That adds a bit of complexity :/
but should still be doable.

Not perfect but maybe something along these lines? (caveat, possibly GNU only)

#!/bin/sh

# files added between revisions x y
added_files() {
    git diff --diff-filter=A --name-only --no-renames $1 $2 ;
}

# folders of those added files
added_folders() {
    added_files $1 $2 |
        sed -e '/[^\/]*/s@^@./@' -e 's@/[^/]*$@/@' |
         sort -u ;
}

# all files tracked by git in *those* folders at HEAD
possible_dupes() {
    added_folders $1 $2 |
        xargs git ls-tree --name-only HEAD ;
}

# case insensitive columns separated by \x1
# eg.
#path\x1PaTh
#path\x1path
case_insensitive() {
    sed -e 's@.*@\L\0\E\x1\0@' |
        sort ;
}

x=$1
y=$2
# Find all duplicates paths (case insensitive)
# in directories which were added between $x $y
possible_dupes $x $y |
    case_insensitive |
    awk -F '\x1' '
        # actual "duplicate" paths, column $2
        # as determined by case-insensitive column $1
        $1 in a { print a[$1]; print $2 }
        { a[$1]=$2 }
    '    | uniq