On Thu, 31 Oct 2024 at 09:04, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > Maybe. Part of the cost seems to be the call, but a bigger part seems > to be the memory accesses around it with that whole > inode->i_sb->s_user_ns chain to load it, and then current->cred->fsuid > to compare against the result. > > Anyway, I'll play around with this a bit more and try to get better profiles. Ok, so I've done some more profiles, and yeah, the costs seem to be almost entirely just cache misses. make_vfsuid() shows up on the profile a bit too, but that seems to really be literally just "it's called a lot, and takes some I$ misses". When the profile looks like this: 10.71 │ push %rbx 15.44 │ mov %edx,%eax 7.88 │ mov %rdi,%rbx │ cmp $0xffffffff82532a00,%rdi 9.65 │ ↓ je 3a ... nothing ... 15.00 │ffffffff813493fa: pop %rbx 41.33 │ffffffff813493fb: → jmp ffffffff81bb5460 <__x86_return_thunk> then it really looks like cache misses and some slow indirect branch resolution for 'ret' (that __x86_return_thunk branch is misleading - it is rewritten at runtime, but 'perf report' shows the original object code). So some of it is presumably due to IBRS on this CPU, and that's a big part of make_vfsuid() in this bare-metal non-idmapped case. But the profile does clearly say that in generic_permission(), the problem is just the cache misses on the superblock accesses and all the extra work with the credential check, even when the mapping is the identity mapping. I still get a fair number of calls to make_vfsuid() even with my patch - I guess I'll have to look into why. This is my regular "full build of an already built kernel", which I *expected* to be mainly just a lot of fstat() calls by 'make' which I would have thought almost always hit the fast case. I might have messed something up. Linus