On Fri, May 29, 2020 at 11:21:02PM +0800, Jia He wrote: > When client tries to connect(SOCK_STREAM) the server in the guest with > NONBLOCK mode, there will be a panic on a ThunderX2 (armv8a server): > [ 463.718844][ T5040] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > [ 463.718848][ T5040] Mem abort info: > [ 463.718849][ T5040] ESR = 0x96000044 > [ 463.718852][ T5040] EC = 0x25: DABT (current EL), IL = 32 bits > [ 463.718853][ T5040] SET = 0, FnV = 0 > [ 463.718854][ T5040] EA = 0, S1PTW = 0 > [ 463.718855][ T5040] Data abort info: > [ 463.718856][ T5040] ISV = 0, ISS = 0x00000044 > [ 463.718857][ T5040] CM = 0, WnR = 1 > [ 463.718859][ T5040] user pgtable: 4k pages, 48-bit VAs, pgdp=0000008f6f6e9000 > [ 463.718861][ T5040] [0000000000000000] pgd=0000000000000000 > [ 463.718866][ T5040] Internal error: Oops: 96000044 [#1] SMP > [...] > [ 463.718977][ T5040] CPU: 213 PID: 5040 Comm: vhost-5032 Tainted: G O 5.7.0-rc7+ #139 > [ 463.718980][ T5040] Hardware name: GIGABYTE R281-T91-00/MT91-FS1-00, BIOS F06 09/25/2018 > [ 463.718982][ T5040] pstate: 60400009 (nZCv daif +PAN -UAO) > [ 463.718995][ T5040] pc : virtio_transport_recv_pkt+0x4c8/0xd40 [vmw_vsock_virtio_transport_common] > [ 463.718999][ T5040] lr : virtio_transport_recv_pkt+0x1fc/0xd40 [vmw_vsock_virtio_transport_common] > [ 463.719000][ T5040] sp : ffff80002dbe3c40 > [...] > [ 463.719025][ T5040] Call trace: > [ 463.719030][ T5040] virtio_transport_recv_pkt+0x4c8/0xd40 [vmw_vsock_virtio_transport_common] > [ 463.719034][ T5040] vhost_vsock_handle_tx_kick+0x360/0x408 [vhost_vsock] > [ 463.719041][ T5040] vhost_worker+0x100/0x1a0 [vhost] > [ 463.719048][ T5040] kthread+0x128/0x130 > [ 463.719052][ T5040] ret_from_fork+0x10/0x18 ^ ^ Maybe we can remove these two columns from the commit message. > > The race condition as follows: > Task1 Task2 > ===== ===== > __sock_release virtio_transport_recv_pkt > __vsock_release vsock_find_bound_socket (found) > lock_sock_nested > vsock_remove_sock > sock_orphan > sk_set_socket(sk, NULL) Here we can add: sk->sk_shutdown = SHUTDOWN_MASK; > ... > release_sock > lock_sock > virtio_transport_recv_connecting > sk->sk_socket->state (panic) > > The root cause is that vsock_find_bound_socket can't hold the lock_sock, > so there is a small race window between vsock_find_bound_socket() and > lock_sock(). If there is __vsock_release() in another task, sk->sk_socket > will be set to NULL inadvertently. > > This fixes it by checking sk->sk_shutdown. > > Signed-off-by: Jia He <justin.he@xxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > Cc: Stefano Garzarella <sgarzare@xxxxxxxxxx> > --- > v2: use lightweight checking suggested by Stefano Garzarella > > net/vmw_vsock/virtio_transport_common.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c > index 69efc891885f..0edda1edf988 100644 > --- a/net/vmw_vsock/virtio_transport_common.c > +++ b/net/vmw_vsock/virtio_transport_common.c > @@ -1132,6 +1132,14 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, > > lock_sock(sk); > > + /* Check if sk has been released before lock_sock */ > + if (sk->sk_shutdown == SHUTDOWN_MASK) { > + (void)virtio_transport_reset_no_sock(t, pkt); > + release_sock(sk); > + sock_put(sk); > + goto free_pkt; > + } > + > /* Update CID in case it has changed after a transport reset event */ > vsk->local_addr.svm_cid = dst.svm_cid; > > -- > 2.17.1 > Anyway, the patch LGTM, let see what David and other say. Reviewed-by: Stefano Garzarella <sgarzare@xxxxxxxxxx> Thanks, Stefano