Ceph's ino32 mount option has trivial collisions. The hash is `ceph_ino_to_ino32` here: https://github.com/torvalds/linux/blob/master/fs/ceph/super.h#L438 A simple collision can be demonstrated: def ceph_ino_to_ino32(vino): ino = vino & 0xffffffff; ino ^= vino >> 32; if not ino: ino = 2 return ino print ceph_ino_to_ino32(0x10000000301) # 0x302'nd inode on mds.0 print ceph_ino_to_ino32(0x20000000001) # 2nd inode on mds.1 Both 0x10000000301 and 0x20000000001 hash to ino32=513 So I know that collisions are very likely when using multiple active MDSs. So I wondered: if we pin the mount prefix to a single mds, then maybe collisions are less likely? Seems so, but I still found exactly one collision in the range(1<<40) to (1<<40)+(1<<25) : 0x10000000102 and 0x10000000100 both hash to ino32=2 Since the collisions are inevitable -- are they handled in some sane/safe way on the mds side? If not -- maybe we should improve or remove the ino32 kernel option? Cheers, Dan On Wed, Oct 16, 2019 at 9:48 AM Ingo Schmidt <i.schmidt@xxxxxxxxxxx> wrote: > > This is not quite true. The numberspace of MD5 is much greater than 2³², (2¹²⁸ exactly) and as long as you don't exhaust this Numberspace, the probability of having a collision is roughly equally likely as with any other input. There might be collisions, and the more Data you have, i.e. the more Adresses you use, the higher the probability. > Security researchers have shown that it is possible to create collisions, but it is very rare. > > I cannot give you an estimate of the consequences of a collision though. It's a matter of what data is stored at that address and how Programs/OSes and even Ceph deal with this. I would suspect, ceph would find a checksum mismatch upon scrubbing. But I don't know how, or if ceph could or would correct this, as the two addresses with the same MD5sum have equally valid copies and i think in such a case it is undecidable, which Data is correct. > > Greetings > Ingo > > ----- Ursprüngliche Mail ----- > Von: "Nathan Fish" <lordcirth@xxxxxxxxx> > An: "ceph-users" <ceph-users@xxxxxxx> > Gesendet: Dienstag, 15. Oktober 2019 19:40:05 > Betreff: Re: CephFS and 32-bit Inode Numbers > > I'm not sure exactly what would happen on an inode collision, but I'm > guessing Bad Things. If my math is correct, a 2^32 inode space will > have roughly 1 collision per 2^16 entries. As that's only 65536, > that's not safe at all. > > On Mon, Oct 14, 2019 at 8:14 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > > > > OK I found that the kernel has an "ino32" mount option which hashes 64 > > bit inos to 32-bit space. > > Has anyone tried this? > > What happens if two files collide? > > > > -- Dan > > > > On Mon, Oct 14, 2019 at 1:18 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > > > > > > Hi all, > > > > > > One of our users has some 32-bit commercial software that they want to > > > use with CephFS, but it's not working because our inode numbers are > > > too large. E.g. his application gets a "file too big" error trying to > > > stat inode 0x40008445FB3. > > > > > > I'm aware that CephFS is offsets the inode numbers by (mds_rank + 1) * > > > 2^40; in the case above the file is managed by mds.3. > > > > > > Did anyone see this same issue and find a workaround? (I read that > > > GlusterFS has an enable-in32 client option -- does CephFS have > > > something like that planned?) > > > > > > Thanks! > > > > > > Dan > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx