Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 8/7/2018 3:31 PM, Junio C Hamano wrote:
Nguyễn Thái Ngọc Duy  <pclouds@xxxxxxxxx> writes:

  One nice thing about this is we don't need platform specific code for
  detecting the duplicate entries. I think ce_match_stat() works even
  on Windows. And it's now equally expensive on all platforms :D

ce_match_stat() may not be a very good measure to see if two paths
refer to the same file, though.  After a fresh checkout, I would not
be surprised if two completely unrelated paths have the same size
and have same mtime/ctime.  In its original use case, i.e. "I have
one specific path in mind and took a snapshot of its metadata
earlier.  Is it still the same, or has it been touched?", that may
be sufficient to detect that the path has been _modified_, but
without reliable inum, it may be a poor measure to say two paths
refer to the same.

I agree with Junio on this one.  The mtime values are sloppy at best.
On FAT file systems, they have 2 second resolution.  Even NTFS IIRC
has only 100ns resolution, so we might get a lot of false matches
using this technique, right?

It might be better to build an equivalence-class hash-map for the
colliding entries.  Compute a "normalized" version of the pathname
(such as convert to lowercase, strip final-dots/spaces, strip the
digits following tilda of a shortname, and etc for the MAC's UTF-isms).
Then when you rescan the index entries to find the matches, apply the
equivalence operator on the pathname and do the hashmap lookup.
When you find a match, you have a "potential" collider pair (I say
potential only because of the ambiguity of shortnames).  Then we
can use inum/file-index/whatever to see if they actually collide.


Jeff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux