Some how Stefan's reply disapeared from my INBOX (although I did see it) so replying here. On Mon, 2016-04-11 at 15:54 +0300, Michael S. Tsirkin wrote: > On Mon, Apr 11, 2016 at 11:45:48AM +0100, Stefan Hajnoczi wrote: > > > > On Fri, Apr 08, 2016 at 04:35:05PM +0100, Ian Campbell wrote: > > > > > > On Fri, 2016-04-01 at 15:23 +0100, Stefan Hajnoczi wrote: > > > > > > > > This series is based on Michael Tsirkin's vhost branch (v4.5-rc6). > > > > > > > > I'm about to process Claudio Imbrenda's locking fixes for virtio-vsock but > > > > first I want to share the latest version of the code. Several people are > > > > playing with vsock now so sharing the latest code should avoid duplicate work. > > > Thanks for this, I've been using it in my project and it mostly seems > > > fine. > > > > > > One wrinkle I came across, which I'm not sure if it is by design or a > > > problem is that I can see this sequence coming from the guest (with > > > other activity in between): > > > > > > 1) OP_SHUTDOWN w/ flags == SHUTDOWN_RX > > > 2) OP_SHUTDOWN w/ flags == SHUTDOWN_TX > > > 3) OP_SHUTDOWN w/ flags == SHUTDOWN_TX|SHUTDOWN_RX > > > > > > I orignally had my backend close things down at #2, however this meant > > > that when #3 arrived it was for a non-existent socket (or, worse, an > > > active one if the ports got reused). I checked v5 of the spec > > > proposal[0] which says: > > > If these bits are set and there are no more virtqueue buffers > > > pending the socket is disconnected. > > > > > > but I'm not entirely sure if this behaviour contradicts this or not > > > (the bits have both been set at #2, but not at the same time). > > > > > > BTW, how does one tell if there are no more virtqueue buffers pending > > > or not while processing the op? > > #2 is odd. The shutdown bits are sticky so they cannot be cleared once > > set. I would have expected just #1 and #3. The behavior you observe > > look like a bug. > > > > The spec text does not convey the meaning of OP_SHUTDOWN well. > > OP_SHUTDOWN SHUTDOWN_TX|SHUTDOWN_RX means no further rx/tx is possible > > for this connection. "there are no more virtqueue buffers pending the > > socket" really means that this isn't an immediate close from the > > perspective of the application. If the application still has unread rx > > buffers then the socket stays readable until the rx data has been fully > > read. Thanks, distinguishing the local buffer to the application from the vring would make that clearer. Perhaps by not talking about "virtqueue buffers" since they sound like a vring thing. However, as Michael observes I'm not sure that's the whole story. > Yes but you also wrote: > If these bits are set and there are no more virtqueue buffers > pending the socket is disconnected. > > how does remote know that there are no buffers pending and so it's safe > to reuse the same source/destination address now? Indeed this is one of the things I struggled with. e.g. If I send a SHUTDOWN_RX to my peer am I supposed to wait for that buffer to come back (so I know the peer has seen it) and then wait for an entire "cycle" of the TX ring to know there is nothing still in flight? That's some tricky book-keeping. > Maybe destination > should send RST at that point? i.e. upon receipt of SHUTDOWN_RX|SHUTDOWN_TX from the peer you are expected to send a RST. When the peer observes that then they know there is no further data in that connection on the ring? That sounds like it would be helpful. > > > Another thing I noticed, which is really more to do with the generic > > > AF_VSOCK bits than anything to do with your patches is that there is no > > > limitations on which vsock ports a non-privileged user can bind to and > > > relatedly that there is no netns support so e.g. users in unproivileged > > > containers can bind to any vsock port and talk to the host, which might > > > be undesirable. For my use for now I just went with the big hammer > > > approach of denying access from anything other than init_net > > > namespace[1] while I consider what the right answer is. > > From the vhost point of view each netns should have its own AF_VSOCK > > namespace. This way two containers could act as "the host" (CID 2) for > > their respective guests. When you say "should" you mean that's the intended design as opposed to what the current code is actually doing, right? Ian. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html