Re: [PATCH 1/2] vsock: each transport cycles only on its own sockets

"Michael S. Tsirkin" <mst@xxxxxxxxxx> · Thu, 10 Mar 2022 08:16:59 -0500

On Thu, Mar 10, 2022 at 10:11:32PM +0900, Jiyong Park wrote:
> Hi Michael,
> 
> Thanks for looking into this.
> 
> Would you mind if I ask what you mean by incomplete? Is it because non-updated
> modules will still have the issue? Please elaborate.

What stefano wrote:
	I think there is the same problem if the g2h driver will be
	unloaded (or a reset event is received after a VM migration), it will close
	all sockets of the nested h2g.
looks like this will keep happening even with your patch, though
I didn't try.

I also don't like how patch 1 adds code that patch 2 removes. Untidy.
Let's just squash and have downstreams worry about stable ABI.

> 
> On Thu, Mar 10, 2022 at 10:02 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
> >
> > On Thu, Mar 10, 2022 at 09:54:24PM +0900, Jiyong Park wrote:
> > > When iterating over sockets using vsock_for_each_connected_socket, make
> > > sure that a transport filters out sockets that don't belong to the
> > > transport.
> > >
> > > There actually was an issue caused by this; in a nested VM
> > > configuration, destroying the nested VM (which often involves the
> > > closing of /dev/vhost-vsock if there was h2g connections to the nested
> > > VM) kills not only the h2g connections, but also all existing g2h
> > > connections to the (outmost) host which are totally unrelated.
> > >
> > > Tested: Executed the following steps on Cuttlefish (Android running on a
> > > VM) [1]: (1) Enter into an `adb shell` session - to have a g2h
> > > connection inside the VM, (2) open and then close /dev/vhost-vsock by
> > > `exec 3< /dev/vhost-vsock && exec 3<&-`, (3) observe that the adb
> > > session is not reset.
> > >
> > > [1] https://android.googlesource.com/device/google/cuttlefish/
> > >
> > > Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
> > > Signed-off-by: Jiyong Park <jiyong@xxxxxxxxxx>
> > > ---
> > >  drivers/vhost/vsock.c            | 4 ++++
> > >  net/vmw_vsock/virtio_transport.c | 7 +++++++
> > >  net/vmw_vsock/vmci_transport.c   | 5 +++++
> > >  3 files changed, 16 insertions(+)
> > >
> > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > index 37f0b4274113..853ddac00d5b 100644
> > > --- a/drivers/vhost/vsock.c
> > > +++ b/drivers/vhost/vsock.c
> > > @@ -722,6 +722,10 @@ static void vhost_vsock_reset_orphans(struct sock *sk)
> > >        * executing.
> > >        */
> > >
> > > +     /* Only handle our own sockets */
> > > +     if (vsk->transport != &vhost_transport.transport)
> > > +             return;
> > > +
> > >       /* If the peer is still valid, no need to reset connection */
> > >       if (vhost_vsock_get(vsk->remote_addr.svm_cid))
> > >               return;
> >
> >
> > We know this is incomplete though. So I think it's the wrong thing to do
> > when you backport, too. If all you worry about is breaking a binary
> > module interface, how about simply exporting a new function when you
> > backport. Thus you will have downstream both:
> >
> > void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
> >
> > void vsock_for_each_connected_socket_new(struct vsock_transport *transport,
> >                                     void (*fn)(struct sock *sk));
> >
> >
> > and then upstream we can squash these two patches.
> >
> > Hmm?
> >
> >
> > > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> > > index fb3302fff627..61b24eb31d4b 100644
> > > --- a/net/vmw_vsock/virtio_transport.c
> > > +++ b/net/vmw_vsock/virtio_transport.c
> > > @@ -24,6 +24,7 @@
> > >  static struct workqueue_struct *virtio_vsock_workqueue;
> > >  static struct virtio_vsock __rcu *the_virtio_vsock;
> > >  static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
> > > +static struct virtio_transport virtio_transport; /* forward declaration */
> > >
> > >  struct virtio_vsock {
> > >       struct virtio_device *vdev;
> > > @@ -357,11 +358,17 @@ static void virtio_vsock_event_fill(struct virtio_vsock *vsock)
> > >
> > >  static void virtio_vsock_reset_sock(struct sock *sk)
> > >  {
> > > +     struct vsock_sock *vsk = vsock_sk(sk);
> > > +
> > >       /* vmci_transport.c doesn't take sk_lock here either.  At least we're
> > >        * under vsock_table_lock so the sock cannot disappear while we're
> > >        * executing.
> > >        */
> > >
> > > +     /* Only handle our own sockets */
> > > +     if (vsk->transport != &virtio_transport.transport)
> > > +             return;
> > > +
> > >       sk->sk_state = TCP_CLOSE;
> > >       sk->sk_err = ECONNRESET;
> > >       sk_error_report(sk);
> > > diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> > > index 7aef34e32bdf..cd2f01513fae 100644
> > > --- a/net/vmw_vsock/vmci_transport.c
> > > +++ b/net/vmw_vsock/vmci_transport.c
> > > @@ -803,6 +803,11 @@ static void vmci_transport_handle_detach(struct sock *sk)
> > >       struct vsock_sock *vsk;
> > >
> > >       vsk = vsock_sk(sk);
> > > +
> > > +     /* Only handle our own sockets */
> > > +     if (vsk->transport != &vmci_transport)
> > > +             return;
> > > +
> > >       if (!vmci_handle_is_invalid(vmci_trans(vsk)->qp_handle)) {
> > >               sock_set_flag(sk, SOCK_DONE);
> > >
> > > --
> > > 2.35.1.723.g4982287a31-goog
> >