On 2024-01-26 16:49, Linus Torvalds wrote:
On Fri, 26 Jan 2024 at 13:36, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
[...]
So please try to look at things to *fix* and simplify, not at things to mess around with and make more complicated.
Hi Linus, I'm all aboard with making things as simple as possible and making sure no complexity is added for the sake of micro-optimization of slow-paths. I do however have a concern with the approach of using the same inode number for various files on the same filesystem: AFAIU it breaks userspace ABI expectations. See inode(7) for instance: Inode number stat.st_ino; statx.stx_ino Each file in a filesystem has a unique inode number. Inode numbers are guaranteed to be unique only within a filesystem (i.e., the same inode numbers may be used by different filesystems, which is the reason that hard links may not cross filesystem boundaries). This field contains the file's inode number. So user-space expecting inode numbers to be unique within a filesystem is not "legacy" in any way. Userspace is allowed to expect this from the ABI. I think that a safe approach to prevent ABI regressions, and just to prevent adding more ABI-corner cases that userspace will have to work-around, would be to issue unique numbers to files within eventfs, but in the simplest/obviously correct implementation possible. It is, after all, a slow path. The issue with the atomic_add_return without any kinds of checks is the scenarios of a userspace loop that would create/delete directories endlessly, thus causing inode re-use. This approach is simple, but it's unfortunately not obviously correct. Because eventfs allows userspace to do mkdir/rmdir, this is unfortunately possible. It would be OK if only the kernel had control over directory creation/removal, but it's not the case here. I would suggest this straightforward solution to this: a) define a EVENTFS_MAX_INODES (e.g. 4096 * 8), b) keep track of inode allocation in a bitmap (within a single page), c) disallow allocating more than "EVENTFS_MAX_INODES" in eventfs. This way even the mkdir/rmdir loop will work fine, but it will prevent keeping too many inodes alive at any given time. The cost is a single page (4K) per eventfs instance. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com