Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 9, 2018 at 2:14 PM Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote:
> On 8/9/2018 10:23 AM, Jeff King wrote:
> > On Wed, Aug 08, 2018 at 05:41:10PM -0700, Junio C Hamano wrote:
> >> If we found that there is something when we tried to write out
> >> "Foo.txt", if we open "Foo.txt" on the working tree and hash-object
> >> it, we should find the matching blob somewhere in the index _before_
> >> "Foo.txt".  On a case-insensitive filesytem, it may well be
> >> "foo.txt", but we do not even have to know "foo.txt" and "Foo.txt"
> >> only differ in case.
> >
> > Clever. You might still run into false positives when there is
> > duplicated content in the repository (especially, say, zero-length
> > files).  But the fact that you only do the hashing on known duplicates
> > helps with that.
>
> I worry that the false positives make this a non-starter.  I mean, if
> clone creates files 'A' and 'B' (both equal) and then tries to create
> 'b', would the collision code reports that 'b' collided with 'A' because
> that was the first OID match?  Ideally with this scheme we'd have to
> search the entire index prior to 'b' and then report that 'b' collided
> with either 'A' or 'B'.  Neither message instills confidence.  And
> there's no way to prefer answer 'B' over 'A' without using knowledge
> of the FS name mangling/aliasing rules -- unless we want to just assume
> ignore-case for this iteration.

A possibly crazy idea: Don't bother reporting the other filename; just
report the OID instead.

"Error: Foo.txt cannot be checked out because another file with hash
<whatever> is in the way."  Maybe even add a hint for the user: "Run
`git ls-files -s` to see see all files and their hash".

Whatever the exact wording for the error message, just create a nice
post on stackoverflow.com explaining the various weird filesystems out
there (VFAT, NTFS, HFS, APFS, etc) and how they cause differing
filenames to be written to the same location.  Have a bunch of folks
vote it up so it has some nice search-engine juice.


The error message isn't quite as good, but does the user really need
all the names of the file?  If so, we gave them enough information to
figure it out, and this is a really unusual case anyway, right?
Besides, now we're back to linear performance....



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux