Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems

Jeff King <peff@xxxxxxxx> · Thu, 9 Aug 2018 17:44:30 -0400

On Thu, Aug 09, 2018 at 02:40:58PM -0700, Elijah Newren wrote:

> > I worry that the false positives make this a non-starter.  I mean, if
> > clone creates files 'A' and 'B' (both equal) and then tries to create
> > 'b', would the collision code reports that 'b' collided with 'A' because
> > that was the first OID match?  Ideally with this scheme we'd have to
> > search the entire index prior to 'b' and then report that 'b' collided
> > with either 'A' or 'B'.  Neither message instills confidence.  And
> > there's no way to prefer answer 'B' over 'A' without using knowledge
> > of the FS name mangling/aliasing rules -- unless we want to just assume
> > ignore-case for this iteration.
> 
> A possibly crazy idea: Don't bother reporting the other filename; just
> report the OID instead.
> 
> "Error: Foo.txt cannot be checked out because another file with hash
> <whatever> is in the way."  Maybe even add a hint for the user: "Run
> `git ls-files -s` to see see all files and their hash".
> 
> Whatever the exact wording for the error message, just create a nice
> post on stackoverflow.com explaining the various weird filesystems out
> there (VFAT, NTFS, HFS, APFS, etc) and how they cause differing
> filenames to be written to the same location.  Have a bunch of folks
> vote it up so it has some nice search-engine juice.

Actually, I kind of like the simplicity of that. It puts the human brain
in the loop.

> The error message isn't quite as good, but does the user really need
> all the names of the file?  If so, we gave them enough information to
> figure it out, and this is a really unusual case anyway, right?
> Besides, now we're back to linear performance....

Well, it's still quadratic when they run O(n) iterations of "git
ls-files -s | grep $colliding_oid". You've just pushed the second linear
search onto the user. ;)

-Peff