Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 08, 2018 at 05:41:10PM -0700, Junio C Hamano wrote:

> > If we have an equivalence-class hashmap and feed it inodes (or again,
> > some system equivalent) as the keys, we should get buckets of
> > collisions.
> 
> I guess one way to get "some system equivalent" that can be used as
> the last resort, when there absolutely is no inum equivalent, is to
> rehash the working tree file that shouldn't be there when we detect
> a collision.
> 
> If we found that there is something when we tried to write out
> "Foo.txt", if we open "Foo.txt" on the working tree and hash-object
> it, we should find the matching blob somewhere in the index _before_
> "Foo.txt".  On a case-insensitive filesytem, it may well be
> "foo.txt", but we do not even have to know "foo.txt" and "Foo.txt"
> only differ in case.

Clever. You might still run into false positives when there is
duplicated content in the repository (especially, say, zero-length
files).  But the fact that you only do the hashing on known duplicates
helps with that.

One of the things I did like about the equivalence-class approach is
that it can be done in a single linear pass in the worst case. Whereas
anything that searches when we see a collision is quite likely to be
quadratic. But as I said before, it may not be worth worrying too much
about that for an error code path where we expect the number of
collisions to be small.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux