On Fri, Oct 14, 2022 at 6:51 AM Torsten Bögershausen <tboegi@xxxxxx> wrote: > > On Thu, Oct 13, 2022 at 08:35:11AM +0200, Tao Klerks wrote: > > Did you ever consider to write a shell script, > that can detect icase-collisions ? > > For example, we can use Linux: > git ls-files | tr 'A-Z' 'a-z' | sort | uniq -d ; echo $? > include/uapi/linux/netfilter_ipv4/ipt_ecn.h > include/uapi/linux/netfilter_ipv4/ipt_ttl.h > [snip the other files] > > The GNU versions of uniq allow an even shorter command, > (But the POSIX versions don't) > > git ls-files | sort | uniq -i -d > > I think that a script like this could do the trick: > > #!/bin/sh > ret=1 > >/tmp/$$-exp > git ls-files | sort | uniq -i -d >/tmp/$$-act && > cmp /tmp/$$-exp /tmp/$$-act && > ret=0 > rm -f /tmp/$$-exp /tmp/$$-act > exit $ret > > > #################### > The usage of files in /tmp is probably debatable, > I want just illustrate how a combination of shell > scripts in combination with existing commands can be used. > > The biggest step may be to introduce a server-side hook > that does a check. > But once that is done and working, you probably do > not want to miss it. Thanks for the proposal! Sorry I was a bit vague in my "I suspect I'll have to do a full-tree duplicate-file-search on every ref update", but your suggestion is almost exactly what I meant. On my machine, on this repo, a full-tree case-insensitive duplicate search costs me about 800ms for 100k files, or 1,800ms for 200k files: git ls-tree --name-only -r $NEWHASH | sort | uniq -i -d I need to use ls-tree rather than ls-files because this is indeed a command to run in an update hook, and there is no working tree - no (relevant) index, in a server-side update hook. The 800ms for 100k files are composed of 200ms of ls-tree, 600ms of sort, and about 10ms of uniq. My intent with supporting icase pathspec magic was to do something like: git --icase-pathspecs ls-tree --name-only -r $NEWHASH -- PATHS OF ADDED FILES | sort | uniq -i -d Which would be near-instantaneous in the vast majority of cases (and I'd have some file count limit past which I would fall back to doing the full tree, to avoid excessive command lengths). Unfortunately, "--icase-pathspecs" is not supported in ls-tree, hence this thread :) But yes - ultimately, paying that "full dupe search" per-update server hook processing time cost has seemed like the only sensible way of doing this - until I thought about Elijah's suggestion a little harder that is! More in the next part of the thread.