Re: [PATCH v6 06/17] fsnotify: generate pre-content permission event on open

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Mon, 11 Nov 2024 15:39:45 -0800

On Mon, 11 Nov 2024 at 15:22, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> See why I'm shouting? You're doing insane things, and you're doing
> them for all the cases that DO NOT MATTER. You're doing all of this
> for the common case that doesn't want to see that kind of mindless
> overhead.

Side note: I think as filesystem people, you guys are taught to think
"IO is expensive, as long as you can avoid IO, things go fast".

And that's largely true at a filesystem level.

But on the VFS level, the common case is actually "everything is
cached in memory, we're never calling down to the filesystem at all".

And then IO isn't the issue.

So on a VFS level, to a very close approximation, the only thing that
matters is cache misses and mispredicted branches.

(Indirect calls have always had some overhead, and Spectre made it
much worse, so arguably indirect calls have become the third thing
that matters).

So in the VFS layer, we have ridiculous tricks like

        if (unlikely(!(inode->i_opflags & IOP_FASTPERM))) {
                if (likely(inode->i_op->permission))
                        return inode->i_op->permission(idmap, inode, mask);

                /* This gets set once for the inode lifetime */
                spin_lock(&inode->i_lock);
                inode->i_opflags |= IOP_FASTPERM;
                spin_unlock(&inode->i_lock);
        }
        return generic_permission(idmap, inode, mask);

in do_inode_permission, because it turns out that the IOP_FASTPERM
flag means that we literally don't even need to dereference
inode->i_op->permission (nasty chain of D$ accesses), and we can
*only* look at accesses off the 'inode' pointer.

Is this an extreme example? Yes. But the whole i_opflags kind of thing
does end up mattering, exactly because it keeps the D$ footprint
smaller.

                  Linus