On Wed, 07 Jul 2021, Daire Byrne wrote: > On Sun, 4 Jul 2021 at 00:03, NeilBrown <neilb@xxxxxxx> wrote: > > > [ 360.481824] ------------[ cut here ]------------ > > > [ 360.483141] kernel BUG at mm/slub.c:4205! > > > > Thanks for testing! > > > > It misunderstood the use of kfree_const(). It doesn't work for > > constants in modules, only constants in vmlinux. So I guess you built > > nfs as a module. > > > > This version should fix that. > > > > Thanks, > > NeilBrown > > Yep, that was the issue and the latest patch certainly helped. I ran a > few load tests and everything seemed to be working fine. > > However, once I tried mounting the same server again using a different > namespace, I got a different looking crash under moderate load. I am > pretty sure I applied your latest patch correctly, but I'll double > check. I should probably remove some of the other patches I have > applied too. > > # mount -o vers=4.2 server:/srv/export /mnt/server1 > # mount -o vers=4.2,namespace=server2 server:/srv/export /mnt/server2 > > [ 3626.638077] general protection fault, probably for non-canonical > address 0x375f656c6966ff00: 0000 [#1] SMP PTI > [ 3626.640538] CPU: 9 PID: 12053 Comm: ls Not tainted 5.13.0-1.dneg.x86_64 #1 > [ 3626.642270] Hardware name: Red Hat dneg, BIOS > 1.11.1-4.module_el8.2.0+320+13f867d7 04/01/2014 > [ 3626.644443] RIP: 0010:__kmalloc_track_caller+0xfa/0x480 > [ 3626.646138] Code: 65 4c 03 05 28 4d d5 69 49 83 78 10 00 4d 8b 20 > 0f 84 4c 03 00 00 4d 85 e4 0f 84 43 03 00 00 41 8b 47 28 49 8b 3f 48 > 8d 4a 01 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 74 bb 41 > 8b 47 > [ 3626.650253] RSP: 0018:ffffaadecf2afb90 EFLAGS: 00010206 > [ 3626.651747] RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000003d41 > [ 3626.653479] RDX: 0000000000003d40 RSI: 0000000000000cc0 RDI: 000000000002fbe0 > [ 3626.655293] RBP: ffffaadecf2afbd0 R08: ffff985aabc6fbe0 R09: ffff985689c76b20 > [ 3626.657034] R10: ffff9858408a0000 R11: ffff985966e69ec0 R12: 375f656c6966ff00 > [ 3626.658794] R13: 0000000000000000 R14: 0000000000000cc0 R15: ffff985680042200 The above Code: shows the crash happens at 2a:* 49 8b 1c 04 mov (%r12,%rax,1),%rbx <-- trapping instruction and %r12 (which should be a memory address) is 375f656c6966ff00, which contains ASCII "file_7". So my guess is that a file name was copied into a buffer that had already been freed. This could be caused by a malloc bug somewhere else, but as the crash was in readdir code, and shows evidence of a file name, it seems likely that the bug is near by. Do you have patches to anything that works with file names? NeilBrown