Re: [PATCH] eventfs: Have inodes have unique inode numbers

Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> · Fri, 26 Jan 2024 17:14:12 -0500

On 2024-01-26 16:49, Linus Torvalds wrote:
On Fri, 26 Jan 2024 at 13:36, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
[...]
So please try to look at things to *fix* and simplify, not at things
to mess around with and make more complicated.

Hi Linus,

I'm all aboard with making things as simple as possible and
making sure no complexity is added for the sake of micro-optimization
of slow-paths.

I do however have a concern with the approach of using the same
inode number for various files on the same filesystem: AFAIU it
breaks userspace ABI expectations. See inode(7) for instance:

       Inode number
              stat.st_ino; statx.stx_ino

              Each  file in a filesystem has a unique inode number.  Inode numbers
              are guaranteed to be unique only within a filesystem (i.e., the same
              inode  numbers  may  be  used by different filesystems, which is the
              reason that hard links may not cross filesystem  boundaries).   This
              field contains the file's inode number.

So user-space expecting inode numbers to be unique within a filesystem
is not "legacy" in any way. Userspace is allowed to expect this from the
ABI.

I think that a safe approach to prevent ABI regressions, and just to prevent
adding more ABI-corner cases that userspace will have to work-around, would
be to issue unique numbers to files within eventfs, but in the
simplest/obviously correct implementation possible. It is, after all, a slow
path.

The issue with the atomic_add_return without any kinds of checks is the
scenarios of a userspace loop that would create/delete directories endlessly,
thus causing inode re-use. This approach is simple, but it's unfortunately
not obviously correct. Because eventfs allows userspace to do mkdir/rmdir,
this is unfortunately possible. It would be OK if only the kernel had control
over directory creation/removal, but it's not the case here.

I would suggest this straightforward solution to this:

a) define a EVENTFS_MAX_INODES (e.g. 4096 * 8),

b) keep track of inode allocation in a bitmap (within a single page),

c) disallow allocating more than "EVENTFS_MAX_INODES" in eventfs.

This way even the mkdir/rmdir loop will work fine, but it will prevent
keeping too many inodes alive at any given time. The cost is a single
page (4K) per eventfs instance.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com