On Mon, 19 Aug 2024 at 11:53, Nathan Chancellor <nathan@xxxxxxxxxx> wrote: > > > Modules linked in: > Pid: 24, comm: mount Not tainted 6.11.0-rc4-next-20240819 > RIP: 0033:0x68006f6c > RSP: 000000006c8bfc68 EFLAGS: 00010206 > RAX: 0000000068006f6c RBX: 0000000068a0aa18 RCX: 00000000600d8b09 > RDX: 0000000000000000 RSI: 0000000068a0aa18 RDI: 0000000068805120 > RBP: 000000006c8bfc70 R08: 0000000000000001 R09: 0000000068ae0308 > R10: 000000000000000e R11: ffffffffffffffff R12: 0000000000000001 > R13: 0000000068a0aa18 R14: 0000000000000015 R15: 0000000068944a88 > Kernel panic - not syncing: Segfault with no mm > CPU: 0 UID: 0 PID: 24 Comm: mount Not tainted 6.11.0-rc4-next-20240819 #1 > Stack: > 600caeff 6c8bfc90 600d8b2a 68944a80 > 00000047 6c8bfda0 600cbfd9 6c8bfd50 > 68944ad0 68944a88 7f7ffff000 7f7fffffff > Call Trace: > [<600caeff>] ? special_mapping_close+0x16/0x19 Hmm. No "Code:" line? Did you just edit that out, or maybe UML doesn't print one out? Anyway, for me that special_mapping_close() disassembles to <+0>: mov %rdi,%rsi <+3>: mov 0x78(%rdi),%rdi <+7>: mov 0x20(%rdi),%rax <+11>: test %rax,%rax <+14>: je 0x600caa11 <special_mapping_close+24> <+16>: push %rbp <+17>: mov %rsp,%rbp <+20>: call *%rax <+22>: pop %rbp <+23>: ret <+24>: ret which may just match yours, because special_mapping_close+0x16 is obviously that +22, and it's the return point for that call. And your %rax value does match that invalid %rip value of 0x68006f6c. So it does look like it's jumping off to la-la-land, and the problem is the code const struct vm_special_mapping *sm = vma->vm_private_data; if (sm->close) sm->close(sm, vma); where presumably 'vm_private_data' isn't a "struct vm_special_mapping *" at all. And I think I see the problem. When we have that 'legacy_special_mapping_vmops', then the vm_private_data field actually points to 'pages'. So the 'legacy_special_mapping_vmops' can *only* contain the '.fault' handler, not the other handlers. IOW, does something like this fix it? --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2095,7 +2095,6 @@ static const struct vm_operations_struct special_mapping_vmops = { }; static const struct vm_operations_struct legacy_special_mapping_vmops = { - .close = special_mapping_close, .fault = special_mapping_fault, }; and honestly, we should have different 'fault' functions instead of having the same fault function and then making the function dynamically see which vm_operations_struct it was. But that's a separate issue. And oh Christ, the difference between "_install_special_mapping()" (which installs the modern style special mapping) and "install_special_mapping()" (which installs the legacy special mapping) is truly horrendously disgusting. And yes, UML uses that legacy crap, which explains why it happens on UML but not on sane architectures. As does csky, hexagon, and nios2. We should get rid of the legacy case entirely, and remove that insane difference between _install_special_mapping() and install_special_mapping(). But in the meantime, the one-liner fix *should* fix it. The four architectures that uses the legacy crud don't care about the close function anyway. What a horrid thing. Linus