Re: icase pathspec magic support in ls-tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 14, 2022 at 9:41 AM Elijah Newren <newren@xxxxxxxxx> wrote:
>
>
> How exactly would case-insensitive matching in ls-tree help you here?

I just attempted to demonstrate in response to Torsten's email; the
assumption was that I could list the added files' paths in a
case-insensitive pathspec, and thereby get all duplicates fast and
efficiently, for reasonably-sized commits.

(new refs and very large commits would still need a full-tree dupe scan)

> Can't you write a hook without such capability that rejects such
> collisions?

It is possible, but far less convenient and I'm not confident that my
shell scripting abilities will get me to a good place.

That said, having thought about your point, my shell scripting
abilities are more likely to get me to a good place than attempting to
add icase pathspec magic support to ls-tree :)

>
> > I don't see this being something I can take on in my spare time, so
> > for now I suspect I'll have to do a full-tree duplicate-file-search on
> > every ref update, and simply accept the 1-second update hook
> > processing time/delay per pushed ref :(
>
> I don't see why you need to do full-tree with existing options, nor
> why the ls-tree option you want would somehow make it easier to avoid.
> I think you can avoid the full-tree search with something like:
>
> git diff --diff-filter=A --no-renames --name-only $OLDHASH $NEWHASH |
> sed -e s%/[^/]*$%/% | uniq | xargs git ls-tree --name-only $NEWHASH |
> \
>    sort | uniq -i -d
>
> The final "sort | uniq -i -d" is taken from Torsten's suggestion.
>
> The git diff ... xargs git ls-tree section on the first line will
> provide a list of all files (& subdirs) in the same directory as any
> added file.  (Although, it has a blind spot for paths in the toplevel
> directory.)

The theoretical problem with this approach is that it only addresses
case-insensitive-duplicate files, not directories.

Directories have been the problem, in "my" repo, around one-third of
the time - typically someone does a directory rename, and someone else
does a bad merge and reintroduces the old directory.

That said, what "icase pathspec magic" actually *does*, is break down
the pathspec into iteratively more complete paths, level by level,
looking for case-duplicates at each level. That's something I could
presumably do in shell scripting, collecting all the interesting
sub-paths first, and then getting ls-tree to tell me about the
immediate children for each sub-path, doing case-insensitive dupe
searches across children for each of these sub-paths.

ls-tree supporting icase pathspec magic would clearly be more
efficient (I wouldn't need N ls-tree git processes, where N is the
number of sub-paths in the diff), but this should be plenty efficient
for normal commits, with a fallback to the full search

This seems like a sensible direction, I'll have a play.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux