Re: icase pathspec magic support in ls-tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 14, 2022 at 6:51 AM Torsten Bögershausen <tboegi@xxxxxx> wrote:
>
> On Thu, Oct 13, 2022 at 08:35:11AM +0200, Tao Klerks wrote:
>
> Did you ever consider to write a shell script,
> that can detect icase-collisions ?
>
> For example, we can use Linux:
>  git ls-files | tr 'A-Z' 'a-z' | sort | uniq -d ; echo $?
>  include/uapi/linux/netfilter_ipv4/ipt_ecn.h
>  include/uapi/linux/netfilter_ipv4/ipt_ttl.h
>  [snip the other files]
>
> The GNU versions of uniq allow an even shorter command,
> (But the POSIX versions don't)
>
> git ls-files  | sort | uniq -i -d
>
> I think that a script like this could do the trick:
>
> #!/bin/sh
> ret=1
> >/tmp/$$-exp
> git ls-files  | sort | uniq -i -d >/tmp/$$-act &&
>   cmp /tmp/$$-exp /tmp/$$-act &&
>     ret=0
>     rm -f /tmp/$$-exp /tmp/$$-act
>     exit $ret
>
>
> ####################
> The usage of files in /tmp is probably debatable,
> I want just illustrate how a combination of shell
> scripts in combination with existing commands can be used.
>
> The biggest step may be to introduce a server-side hook
> that does a check.
> But once that is done and working, you probably do
> not want to miss it.

Thanks for the proposal! Sorry I was a bit vague in my "I suspect I'll
have to do a full-tree duplicate-file-search on every ref update", but
your suggestion is almost exactly what I meant.

On my machine, on this repo, a full-tree case-insensitive duplicate
search costs me about 800ms for 100k files, or 1,800ms for 200k files:

git ls-tree --name-only -r $NEWHASH | sort | uniq -i -d

I need to use ls-tree rather than ls-files because this is indeed a
command to run in an update hook, and there is no working tree - no
(relevant) index, in a server-side update hook.

The 800ms for 100k files are composed of 200ms of ls-tree, 600ms of
sort, and about 10ms of uniq.

My intent with supporting icase pathspec magic was to do something like:

git --icase-pathspecs ls-tree --name-only -r $NEWHASH -- PATHS OF
ADDED FILES | sort | uniq -i -d

Which would be near-instantaneous in the vast majority of cases (and
I'd have some file count limit past which I would fall back to doing
the full tree, to avoid excessive command lengths). Unfortunately,
"--icase-pathspecs" is not supported in ls-tree, hence this thread :)

But yes - ultimately, paying that "full dupe search" per-update server
hook processing time cost has seemed like the only sensible way of
doing this - until I thought about Elijah's suggestion a little harder
that is!

More in the next part of the thread.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux