On Mon, Jan 13, 2025 at 11:12:52AM +0100, Michal Luczaj wrote:
On 1/13/25 10:07, Stefano Garzarella wrote:
On Mon, 13 Jan 2025 at 09:57, Stefano Garzarella <sgarzare@xxxxxxxxxx> wrote:
On Sun, Jan 12, 2025 at 11:42:30PM +0100, Michal Luczaj wrote:
[...]
So, if I get this right:
1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2)
2. transport->release() calls vsock_remove_bound() without checking if sk
was bound and moved to bound list (refcnt=1)
3. vsock_bind() assumes sk is in unbound list and before
__vsock_insert_bound(vsock_bound_sockets()) calls
__vsock_remove_bound() which does:
list_del_init(&vsk->bound_table); // nop
sock_put(&vsk->sk); // refcnt=0
The following fixes things for me. I'm just not certain that's the only
place where transport destruction may lead to an unbound socket being
removed from the unbound list.
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 7f7de6d88096..0fe807c8c052 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1303,7 +1303,8 @@ void virtio_transport_release(struct vsock_sock *vsk)
if (remove_sock) {
sock_set_flag(sk, SOCK_DONE);
- virtio_transport_remove_sock(vsk);
+ if (vsock_addr_bound(&vsk->local_addr))
+ virtio_transport_remove_sock(vsk);
I don't get this fix, virtio_transport_remove_sock() calls
vsock_remove_sock()
vsock_remove_bound()
if (__vsock_in_bound_table(vsk))
__vsock_remove_bound(vsk);
So, should already avoid this issue, no?
I got it wrong, I see now what are you trying to do, but I don't think
we should skip virtio_transport_remove_sock() entirely, it also purge
the rx_queue.
Isn't rx_queue empty-by-definition in case of !__vsock_in_bound_table(vsk)?
It could be.
But I see some other issues:
- we need to fix also in the other transports, since they do the same
- we need to check delayed cancel work too that call
virtio_transport_remove_sock()
An alternative approach, which would perhaps allow us to avoid all this,
is to re-insert the socket in the unbound list after calling release()
when we deassign the transport.
WDYT?
Stefano
Can the problem be in vsock_bind() ?
Well, I wouldn't say so.
Is this issue pre-existing or introduced by this series?
I think this is pre-existing, can you confirm?
Yup, I agree, pre-existing.
In that case, I'd not stop this series, and fix it in another patch/series.
Yeah, sure thing.
Thanks,
Michal