On Sunday 16 August 2009, Michael S. Tsirkin wrote: > On Fri, Aug 14, 2009 at 01:40:36PM +0200, Arnd Bergmann wrote: > > > > * most of the transports are sockets, tap uses a character device. > > This could be dealt with by having both a struct socket * in > > struct vhost_net *and* a struct file *, or by always keeping the > > struct file and calling vfs_readv/vfs_writev for the data transport > > in both cases. > > I am concerned that character devices might have weird side effects with > read/write operations and that calling them from kernel thread the way I > do might have security implications. Can't point at anything specific > though at the moment. I understand your feelings about passing a chardev fd into your driver and I agree that we need to be very careful if we want to allow it. Maybe we could instead extend the 'splice' system call to work on a vhost_net file descriptor. If we do that, we can put the access back into a user thread (or two) that stays in splice indefinetely to avoid some of the implications of kernel threads like the missing ability to handle transfer errors in user space. > I wonder - can we expose the underlying socket used by tap, or will that > create complex lifetime issues? I think this could get more messy in the long run than calling vfs_readv on a random fd. It would mean deep internal knowledge of the tap driver in vhost_net, which I really would prefer to avoid. > > * Each transport has a slightly different header, we have > > - raw ethernet frames (raw, udp multicast, tap) > > - 32-bit length + raw frames, possibly fragmented (tcp) > > - 80-bit header + raw frames, possibly fragmented (tap with vnet_hdr) > > To handle these three cases, we need either different ioctl numbers > > so that vhost_net can choose the right one, or a flags field in > > VHOST_NET_SET_SOCKET, like > > > > #define VHOST_NET_RAW 1 > > #define VHOST_NET_LEN_HDR 2 > > #define VHOST_NET_VNET_HDR 4 > > > > struct vhost_net_socket { > > unsigned int flags; > > int fd; > > }; > > #define VHOST_NET_SET_SOCKET _IOW(VHOST_VIRTIO, 0x30, struct vhost_net_socket) > > It seems we can query the socket to find out the type, yes, I understand that you can do that, but I still think that decision should be left to user space. Adding a length header for TCP streams but not for UDP is something that we would normally want to do, but IMHO vhost_net should not need to know about this. > or use the features ioctl. Right, I had forgotten about that one. It's probably equivalent to the flags I suggested, except that one allows you to set features after starting the communication, while the other one prevents you from doing that. > > Qemu could then automatically try to use vhost_net, if it's available > > in the kernel, or just fall back on software vlan otherwise. > > Does that make sense? > > I agree, long term it should be enabled automatically when possible. So how about making the qemu command line interface an extension to what Or Gerlitz has done for the raw packet sockets? Arnd <>< -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html