On Fri, Oct 14, 2022 at 9:41 AM Elijah Newren <newren@xxxxxxxxx> wrote: > > > How exactly would case-insensitive matching in ls-tree help you here? I just attempted to demonstrate in response to Torsten's email; the assumption was that I could list the added files' paths in a case-insensitive pathspec, and thereby get all duplicates fast and efficiently, for reasonably-sized commits. (new refs and very large commits would still need a full-tree dupe scan) > Can't you write a hook without such capability that rejects such > collisions? It is possible, but far less convenient and I'm not confident that my shell scripting abilities will get me to a good place. That said, having thought about your point, my shell scripting abilities are more likely to get me to a good place than attempting to add icase pathspec magic support to ls-tree :) > > > I don't see this being something I can take on in my spare time, so > > for now I suspect I'll have to do a full-tree duplicate-file-search on > > every ref update, and simply accept the 1-second update hook > > processing time/delay per pushed ref :( > > I don't see why you need to do full-tree with existing options, nor > why the ls-tree option you want would somehow make it easier to avoid. > I think you can avoid the full-tree search with something like: > > git diff --diff-filter=A --no-renames --name-only $OLDHASH $NEWHASH | > sed -e s%/[^/]*$%/% | uniq | xargs git ls-tree --name-only $NEWHASH | > \ > sort | uniq -i -d > > The final "sort | uniq -i -d" is taken from Torsten's suggestion. > > The git diff ... xargs git ls-tree section on the first line will > provide a list of all files (& subdirs) in the same directory as any > added file. (Although, it has a blind spot for paths in the toplevel > directory.) The theoretical problem with this approach is that it only addresses case-insensitive-duplicate files, not directories. Directories have been the problem, in "my" repo, around one-third of the time - typically someone does a directory rename, and someone else does a bad merge and reintroduces the old directory. That said, what "icase pathspec magic" actually *does*, is break down the pathspec into iteratively more complete paths, level by level, looking for case-duplicates at each level. That's something I could presumably do in shell scripting, collecting all the interesting sub-paths first, and then getting ls-tree to tell me about the immediate children for each sub-path, doing case-insensitive dupe searches across children for each of these sub-paths. ls-tree supporting icase pathspec magic would clearly be more efficient (I wouldn't need N ls-tree git processes, where N is the number of sub-paths in the diff), but this should be plenty efficient for normal commits, with a fallback to the full search This seems like a sensible direction, I'll have a play.