On Fri, Oct 20, 2023 at 12:12:04AM +0300, Alexandru Matei wrote:
On 10/19/2023 11:54 AM, Stefano Garzarella wrote:
On Wed, Oct 18, 2023 at 09:32:47PM +0300, Alexandru Matei wrote:
Once VQs are filled with empty buffers and we kick the host, it can send
connection requests. If 'the_virtio_vsock' is not initialized before,
replies are silently dropped and do not reach the host.
Are replies really dropped or we just miss the notification?
Could the reverse now happen, i.e., the guest wants to send a connection request, finds the pointer assigned but can't use virtqueues because they haven't been initialized yet?
Perhaps to avoid your problem, we could just queue vsock->rx_work at the bottom of the probe to see if anything was queued in the meantime.
Nit: please use "vsock/virtio" to point out that this problem is of the virtio transport.
Thanks,
Stefano
The replies are dropped , the scenario goes like this:
Once rx_run is set to true and rx queue is filled with empty buffers, the host sends a connection request.
Oh, I see now, I thought virtio_transport_rx_work() returned early if
'the_virtio_vsock' was not set.
The request is processed in virtio_transport_recv_pkt(), and since there is no bound socket, it calls virtio_transport_reset_no_sock() which tries to send a reset packet.
In virtio_transport_send_pkt() it checks 'the_virtio_vsock' and because it is null it exits with -ENODEV, basically dropping the packet.
I looked on your scenario and there is an issue from the moment we set the_virtio_vsock (in this patch) up until vsock->tx_run is set to TRUE.
virtio_transport_send_pkt() will queue the packet, but virtio_transport_send_pkt_work() will exit because tx_run is FALSE. This could be fixed by moving rcu_assign_pointer() after tx_run is set to TRUE.
virtio_transport_cancel_pkt() uses the rx virtqueue once the_virtio_vsock is set, so rcu_assign_pointer() should be moved after virtio_find_vqs() is called.
I think the way to go is to split virtio_vsock_vqs_init() in two:
virtio_vsock_vqs_init() and virtio_vsock_vqs_fill(), as Vadim
suggested. This should fix all the cases:
Yep, LGTM!
Thank you both for the fix, please send a v2 with this approach!
Stefano
---
net/vmw_vsock/virtio_transport.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index ad64f403536a..1f95f98ddd3f 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -594,6 +594,11 @@ static int virtio_vsock_vqs_init(struct virtio_vsock *vsock)
vsock->tx_run = true;
mutex_unlock(&vsock->tx_lock);
+ return 0;
+}
+
+static void virtio_vsock_vqs_fill(struct virtio_vsock *vsock)
+{
mutex_lock(&vsock->rx_lock);
virtio_vsock_rx_fill(vsock);
vsock->rx_run = true;
@@ -603,8 +608,6 @@ static int virtio_vsock_vqs_init(struct virtio_vsock *vsock)
virtio_vsock_event_fill(vsock);
vsock->event_run = true;
mutex_unlock(&vsock->event_lock);
-
- return 0;
}
static void virtio_vsock_vqs_del(struct virtio_vsock *vsock)
@@ -707,6 +710,7 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
goto out;
rcu_assign_pointer(the_virtio_vsock, vsock);
+ virtio_vsock_vqs_fill(vsock);
mutex_unlock(&the_virtio_vsock_mutex);
@@ -779,6 +783,7 @@ static int virtio_vsock_restore(struct virtio_device *vdev)
goto out;
rcu_assign_pointer(the_virtio_vsock, vsock);
+ virtio_vsock_vqs_fill(vsock);
out:
mutex_unlock(&the_virtio_vsock_mutex);
--