On Sunday, June 7, 2020 10:32:44 AM CDT Hans-Peter Jansen wrote: > Hi, > > after upgrading the kernel from 5.6.11 to 5.6.14, we suffer from regular > crashes of nfsd here: > > 2020-06-07T01:32:43.600306+02:00 server rpc.mountd[2664]: authenticated > mount request from 192.168.3.16:303 for /work (/work) > 2020-06-07T01:32:43.602594+02:00 server rpc.mountd[2664]: authenticated > mount request from 192.168.3.16:304 for /work/vmware (/work) > 2020-06-07T01:32:43.602971+02:00 server rpc.mountd[2664]: authenticated > mount request from 192.168.3.16:305 for /work/vSphere (/work) > 2020-06-07T01:32:43.606276+02:00 server kernel: [51901.089211] general > protection fault, probably for non-canonical address 0xb9159d506ba40000: > 0000 [#1] SMP PTI 2020-06-07T01:32:43.606284+02:00 server kernel: > [51901.089226] CPU: 1 PID: 3190 Comm: nfsd Tainted: G O > 5.6.14-lp151.2-default #1 openSUSE Tumbleweed (unreleased) > 2020-06-07T01:32:43.606286+02:00 server kernel: [51901.089234] Hardware > name: System manufacturer System Product Name/P7F-E, BIOS 0906 > 09/20/2010 2020-06-07T01:32:43.606287+02:00 server kernel: [51901.089247] > RIP: 0010:cgroup_sk_free+0x26/0x80 2020-06-07T01:32:43.606288+02:00 server > kernel: [51901.089257] Code: 00 00 00 00 66 66 66 66 90 53 48 8b 07 48 c7 > c3 30 72 07 b6 a8 01 75 07 48 85 c0 48 0f 45 d8 48 8b 83 18 09 00 00 a8 03 > 75 1a <65> 48 ff 08 f6 43 7c 01 74 02 5b c3 48 8b 43 18 a8 03 75 26 65 48 > 2020-06-07T01:32:43.606290+02:00 server kernel: [51901.089276] RSP: > 0018:ffffb248c21e7e10 EFLAGS: 00010246 2020-06-07T01:32:43.606291+02:00 > server kernel: [51901.089280] RAX: b91603a504000000 RBX: ffff99ab141a0000 > RCX: 0000000000000021 2020-06-07T01:32:43.606292+02:00 server kernel: > [51901.089284] RDX: ffffffffb6135ec4 RSI: 0000000000010080 RDI: > ffff99a7159c1490 2020-06-07T01:32:43.606293+02:00 server kernel: > [51901.089287] RBP: ffff99a7159c1200 R08: ffff99ab67a60c60 R09: > 000000000002eb00 2020-06-07T01:32:43.606294+02:00 server kernel: > [51901.089291] R10: ffffb248c0087dc0 R11: 00000000000000c6 R12: > 0000000000000000 2020-06-07T01:32:43.606295+02:00 server kernel: > [51901.089294] R13: 0000000000000103 R14: ffff99aae4934238 R15: > ffff99ab31902000 2020-06-07T01:32:43.606296+02:00 server kernel: > [51901.089299] FS: 0000000000000000(0000) GS:ffff99ab67a40000(0000) > knlGS:0000000000000000 2020-06-07T01:32:43.606297+02:00 server kernel: > [51901.089303] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > 2020-06-07T01:32:43.606303+02:00 server kernel: [51901.089305] CR2: > 00000000008e0000 CR3: 00000004df60a000 CR4: 00000000000026e0 > 2020-06-07T01:32:43.606304+02:00 server kernel: [51901.089307] Call Trace: > 2020-06-07T01:32:43.606305+02:00 server kernel: [51901.089315] > __sk_destruct+0x10d/0x1d0 2020-06-07T01:32:43.606306+02:00 server kernel: > [51901.089319] inet_release+0x34/0x60 2020-06-07T01:32:43.606307+02:00 > server kernel: [51901.089325] __sock_release+0x81/0xb0 > 2020-06-07T01:32:43.606308+02:00 server kernel: [51901.089358] > svc_sock_free+0x38/0x60 [sunrpc] 2020-06-07T01:32:43.606308+02:00 server > kernel: [51901.089374] svc_xprt_put+0x99/0xe0 [sunrpc] > 2020-06-07T01:32:43.606310+02:00 server kernel: [51901.089389] > svc_recv+0x9c0/0xa40 [sunrpc] 2020-06-07T01:32:43.606310+02:00 server > kernel: [51901.089410] ? nfsd_destroy+0x60/0x60 [nfsd] > 2020-06-07T01:32:43.606311+02:00 server kernel: [51901.089417] > nfsd+0xd1/0x150 [nfsd] 2020-06-07T01:32:43.606312+02:00 server kernel: > [51901.089420] kthread+0x10d/0x130 2020-06-07T01:32:43.606313+02:00 server > kernel: [51901.089423] ? kthread_park+0x90/0x90 > 2020-06-07T01:32:43.606314+02:00 server kernel: [51901.089426] > ret_from_fork+0x35/0x40 > > A vSphere 5.5 host accesses this linux server with nfs v3 for backup > purposes (a Veeam backup server want to store a new backup here). > > The kernel is tainted due to vboxdrv. The OS is openSUSE Leap 15.1, > with the kernel and Virtualbox replaced with uptodate versions from > proper rpm packages (built on that very vSphere host in a OBS server > VM..). > > I used to be subscribed to this ML, but that subscription has been > lost 04/09, thus I cannot reply properly to the general prot. fault > thread, started 05/12 from syzbot with Bruce looking into it. > > It seems somewhat related. > > Interestingly, we're using a couple of NFS v4 mounts for subsets of > home here, and mount /work and other shares from various > Tumbleweed systems with NFS v4 here without any undesired effects. > > Since the kernel upgrade, every time, this Veeam thing triggers these > v3 mounts, the crash happens. I've disabled this backup target for now > until the problem is resolved, because it effectively prevents further > nfs accesses to this server, and blocks our desktops until the server > is rebooted. > > A cursory look into 5.6.{15,16} changelogs seems to imply, that this > issue is still pending. > > Let me know, if I can provide any further info's. > > Thanks, > Pete I see similar issues in Fedora kernels 5.6.14 through 5.6.16 https://bugzilla.redhat.com/show_bug.cgi?id=1839287 On the client I mount /home with sec=krb5p, and /mnt/koji with sec=krb5 -- Anthony - https://messinet.com F9B6 560E 68EA 037D 8C3D D1C9 FF31 3BDB D9D8 99B6
Attachment:
signature.asc
Description: This is a digitally signed message part.