Darrick J. Wong writes:
On Fri, Dec 20, 2019 at 02:49:36AM +0000, Chris Down wrote:
In Facebook production we are seeing heavy inode number wraparounds on
tmpfs. On affected tiers, in excess of 10% of hosts show multiple files
with different content and the same inode number, with some servers even
having as many as 150 duplicated inode numbers with differing file
content.
This causes actual, tangible problems in production. For example, we
have complaints from those working on remote caches that their
application is reporting cache corruptions because it uses (device,
inodenum) to establish the identity of a particular cache object, but
...but you cannot delete the (dev, inum) tuple from the cache index when
you remove a cache object??
There are some cache objects which may be long-lived. In these kinds of cases,
the cache objects aren't removed until they're conclusively not needed.
Since tmpfs shares the i_ino counter with every other user of get_next_ino,
it's then entirely possible that we can thrash through 2^32 inodes within a
period that it's possible for a single cache file to exist.
because it's not unique any more, the application refuses to continue
and reports cache corruption. Even worse, sometimes applications may not
even detect the corruption but may continue anyway, causing phantom and
hard to debug behaviour.
In general, userspace applications expect that (device, inodenum) should
be enough to be uniquely point to one inode, which seems fair enough.
Except that it's not. (dev, inum, generation) uniquely points to an
instance of an inode from creation to the last unlink.
I didn't mention generation because, even though it's set on tmpfs (to
prandom_u32()), it's not possible to evaluate it from userspace since `ioctl`
returns ENOTTY. We can't ask userspace applications to introspect on an inode
attribute that they can't even access :-)