On Fri, Oct 14, 2022 at 1:48 AM Tao Klerks <tao@xxxxxxxxxx> wrote: > > On Fri, Oct 14, 2022 at 9:41 AM Elijah Newren <newren@xxxxxxxxx> wrote: > > [...] > > I don't see why you need to do full-tree with existing options, nor > > why the ls-tree option you want would somehow make it easier to avoid. > > I think you can avoid the full-tree search with something like: > > > > git diff --diff-filter=A --no-renames --name-only $OLDHASH $NEWHASH | > > sed -e s%/[^/]*$%/% | uniq | xargs git ls-tree --name-only $NEWHASH | > > \ > > sort | uniq -i -d > > > > The final "sort | uniq -i -d" is taken from Torsten's suggestion. > > > > The git diff ... xargs git ls-tree section on the first line will > > provide a list of all files (& subdirs) in the same directory as any > > added file. (Although, it has a blind spot for paths in the toplevel > > directory.) > > The theoretical problem with this approach is that it only addresses > case-insensitive-duplicate files, not directories. It'll catch some case-insensitive-duplicate directories too -- note that I did call out that it'd print subdirs. But to be more cautious, you would need to carefully grab all leading directories of any added path, not just the immediate leading directory. > Directories have been the problem, in "my" repo, around one-third of > the time - typically someone does a directory rename, and someone else > does a bad merge and reintroduces the old directory. > > That said, what "icase pathspec magic" actually *does*, is break down > the pathspec into iteratively more complete paths, level by level, > looking for case-duplicates at each level. That's something I could > presumably do in shell scripting, collecting all the interesting > sub-paths first, and then getting ls-tree to tell me about the > immediate children for each sub-path, doing case-insensitive dupe > searches across children for each of these sub-paths. > > ls-tree supporting icase pathspec magic would clearly be more > efficient (I wouldn't need N ls-tree git processes, where N is the > number of sub-paths in the diff), but this should be plenty efficient > for normal commits, with a fallback to the full search > > This seems like a sensible direction, I'll have a play. If you create a script that gives you all leading directories of any listed path (plus replacing the toplevel dir with ':/'), such as this (which I'm calling 'all-leading-dirs.py'): """ #!/usr/bin/env python3 import os import sys paths = sys.stdin.read().splitlines() dirs_seen = set() for path in paths: dir = path while dir: dir = os.path.dirname(dir) if dir in dirs_seen: continue dirs_seen.add(dir) if dirs_seen: # Replace top-level dir of "" with ":"; we'll add the trailing '/' below when adding it to all other dirs dirs_seen.remove("") dirs_seen.add(':') for dir in dirs_seen: print(dir+'/') # ls-tree wants the trailing '/' if we are going to list contents within that tree rather than just the tree itself """ Then the following will catch duplicates files and directories for you: git diff --diff-filter=A --no-renames --name-only HEAD~1 HEAD | all-leading-dirs.py | xargs --no-run-if-empty git ls-tree --name-only -t HEAD | sort | uniq -i -d and it no longer has problems catching duplicates in the toplevel directory either. It does have (at most) two git invocations, but only one invocation of ls-tree. Here's a test script to prove it works: """ #!/bin/bash git init -b main nukeme cd nukeme mkdir -p dir1/subdir/whatever mkdir -p dir2/subdir/whatever >dir1/subdir/whatever/foo >dir2/subdir/whatever/foo git add . git commit -m initial mkdir -p dir1/SubDir/whatever >dir1/SubDir/whatever/foo git add . git commit -m stuff git diff --diff-filter=A --no-renames --name-only HEAD~1 HEAD | all-leading-dirs.py | xargs --no-run-if-empty git ls-tree --name-only -t HEAD | sort | uniq -i -d """ The output of this script is """ dir1/subdir """ which correctly notifies on the duplicate (dir1/SubDir being the other; uniq is the one that picks which of the two duplicate names to print)