> On Sep 13, 2017, at 5:19 PM, Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > On Wed 13-09-17 15:07:26, Jorgen S. Hansen wrote: >> >>> On Sep 12, 2017, at 11:08 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote: >>> >>> Hi, >>> we are seeing the following splat with Debian 3.16 stable kernel >>> >>> BUG: scheduling while atomic: MATLAB/26771/0x00000100 >>> Modules linked in: veeamsnap(O) hmac cbc cts nfsv4 dns_resolver rpcsec_gss_krb5 nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc vmw_vso$ >>> CPU: 0 PID: 26771 Comm: MATLAB Tainted: G O 3.16.0-4-amd64 #1 Debian 3.16.7-ckt20-1+deb8u3 >>> Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015 >>> ffff88315c1e4c20 ffffffff8150db3f ffff88193f803dc8 ffffffff8150acdf >>> ffffffff815103a2 0000000000012f00 ffff8819423dbfd8 0000000000012f00 >>> ffff88315c1e4c20 ffff88193f803dc8 ffff88193f803d50 ffff88193f803dc0 >>> Call Trace: >>> <IRQ> [<ffffffff8150db3f>] ? dump_stack+0x41/0x51 >>> [<ffffffff8150acdf>] ? __schedule_bug+0x48/0x55 >>> [<ffffffff815103a2>] ? __schedule+0x5d2/0x700 >>> [<ffffffff8150f9b9>] ? schedule_timeout+0x229/0x2a0 >>> [<ffffffff8109ba70>] ? select_task_rq_fair+0x390/0x700 >>> [<ffffffff8109f780>] ? check_preempt_wakeup+0x120/0x1d0 >>> [<ffffffff81510eb8>] ? wait_for_completion+0xa8/0x120 >>> [<ffffffff81096de0>] ? wake_up_state+0x10/0x10 >>> [<ffffffff810c3da0>] ? call_rcu_bh+0x20/0x20 >>> [<ffffffff810c180b>] ? wait_rcu_gp+0x4b/0x60 >>> [<ffffffff810c17b0>] ? ftrace_raw_output_rcu_utilization+0x40/0x40 >>> [<ffffffffa02ca6f5>] ? vmci_event_unsubscribe+0x75/0xb0 [vmw_vmci] >>> [<ffffffffa031f5cd>] ? vmci_transport_destruct+0x1d/0xe0 [vmw_vsock_vmci_transport] >>> [<ffffffffa03167e3>] ? vsock_sk_destruct+0x13/0x60 [vsock] >>> [<ffffffff81409f7a>] ? __sk_free+0x1a/0x130 >>> [<ffffffffa0320218>] ? vmci_transport_recv_stream_cb+0x1e8/0x2d0 [vmw_vsock_vmci_transport] >>> [<ffffffffa02c9cba>] ? vmci_datagram_invoke_guest_handler+0xaa/0xd0 [vmw_vmci] >>> [<ffffffffa02cab51>] ? vmci_dispatch_dgs+0xc1/0x200 [vmw_vmci] >>> [<ffffffff8106c294>] ? tasklet_action+0xf4/0x100 >>> [<ffffffff8106c681>] ? __do_softirq+0xf1/0x290 >>> [<ffffffff8106ca55>] ? irq_exit+0x95/0xa0 >>> [<ffffffff81516b22>] ? do_IRQ+0x52/0xe0 >>> [<ffffffff8151496d>] ? common_interrupt+0x6d/0x6d >>> >>> AFAICS this has been fixed by 4ef7ea9195ea ("VSOCK: sock_put wasn't safe >>> to call in interrupt context") but this patch hasn't been backported to >>> stable trees. It applies cleanly on top of 3.16 stable tree but I am not >>> familiar with the code to send the backport to the stable maintainer >>> directly. >>> >>> Could you double check that the patch below (just a blind cherry-pick) >>> is correct and it doesn't need additional patches on top? >> >> Hi, >> >> The patch below has been used to fix the above issue by other distros >> - among them Redhat for the 3.10 kernel, so it should work for 3.16 as >> well. > > Thanks for the confirmation. I do not see 4ef7ea9195ea ("VSOCK: sock_put > wasn't safe to call in interrupt context") in 3.10 stable branch > though. > >> In addition to the patch above, there are two other patches that >> need to be applied on top for the fix to be correct: >> >> 8566b86ab9f0f45bc6f7dd422b21de9d0cf5415a "VSOCK: Fix lockdep issue." >> >> and >> >> 8ab18d71de8b07d2c4d6f984b718418c09ea45c5 "VSOCK: Detach QP check should filter out non matching QPs." > > Good to know. I will send all three patches cherry-picked on top of the > current 3.16 stable branch. Could you have a look please? The patch series look good to me. Thanks for taking care of this, Jorgen