On Wed, Aug 19, 2009 at 11:04:50AM +0200, Arnd Bergmann wrote: > On Sunday 16 August 2009, Michael S. Tsirkin wrote: > > On Fri, Aug 14, 2009 at 01:40:36PM +0200, Arnd Bergmann wrote: > > > > > > * most of the transports are sockets, tap uses a character device. > > > This could be dealt with by having both a struct socket * in > > > struct vhost_net *and* a struct file *, or by always keeping the > > > struct file and calling vfs_readv/vfs_writev for the data transport > > > in both cases. > > > > I am concerned that character devices might have weird side effects with > > read/write operations and that calling them from kernel thread the way I > > do might have security implications. Can't point at anything specific > > though at the moment. > > I understand your feelings about passing a chardev fd into your driver > and I agree that we need to be very careful if we want to allow it. > > Maybe we could instead extend the 'splice' system call to work on a > vhost_net file descriptor. If we do that, we can put the access back > into a user thread (or two) that stays in splice indefinetely An issue with exposing internal threading model to userspace in this way is that we lose control of e.g. CPU locality - and it is very hard for userspace to get it right. > to > avoid some of the implications of kernel threads like the missing > ability to handle transfer errors in user space. Are you talking about TCP here? Transfer errors are typically asynchronous - possibly eventfd as I expose for vhost net is sufficient there. > > I wonder - can we expose the underlying socket used by tap, or will that > > create complex lifetime issues? > > I think this could get more messy in the long run than calling vfs_readv > on a random fd. It would mean deep internal knowledge of the tap driver > in vhost_net, which I really would prefer to avoid. No, what I had in mind is adding a GET_SOCKET ioctl to tap. vhost would then just use the socket. > > > * Each transport has a slightly different header, we have > > > - raw ethernet frames (raw, udp multicast, tap) > > > - 32-bit length + raw frames, possibly fragmented (tcp) > > > - 80-bit header + raw frames, possibly fragmented (tap with vnet_hdr) > > > To handle these three cases, we need either different ioctl numbers > > > so that vhost_net can choose the right one, or a flags field in > > > VHOST_NET_SET_SOCKET, like > > > > > > #define VHOST_NET_RAW 1 > > > #define VHOST_NET_LEN_HDR 2 > > > #define VHOST_NET_VNET_HDR 4 > > > > > > struct vhost_net_socket { > > > unsigned int flags; > > > int fd; > > > }; > > > #define VHOST_NET_SET_SOCKET _IOW(VHOST_VIRTIO, 0x30, struct vhost_net_socket) > > > > It seems we can query the socket to find out the type, > > yes, I understand that you can do that, but I still think that decision > should be left to user space. Adding a length header for TCP streams but > not for UDP is something that we would normally want to do, but IMHO > vhost_net should not need to know about this. > > > or use the features ioctl. > > Right, I had forgotten about that one. It's probably equivalent > to the flags I suggested, except that one allows you to set features > after starting the communication, while the other one prevents > you from doing that. > > > > Qemu could then automatically try to use vhost_net, if it's available > > > in the kernel, or just fall back on software vlan otherwise. > > > Does that make sense? > > > > I agree, long term it should be enabled automatically when possible. > > So how about making the qemu command line interface an extension to > what Or Gerlitz has done for the raw packet sockets? > > Arnd <>< Not sure I see the connection, but I have not thought about qemu side of things too much yet - trying to get kernel bits in place first so that there's a stable ABI to work with. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html