On 10/26/21 12:37 AM, Jason Wang wrote: > > 在 2021/10/22 下午1:19, Mike Christie 写道: >> This patch allows userspace to create workers and bind them to vqs. You >> can have N workers per dev and also share N workers with M vqs. >> >> Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx> > > > A question, who is the best one to determine the binding? Is it the VMM (Qemu etc) or the management stack? If the latter, it looks to me it's better to expose this via sysfs? I thought it would be where you have management app settings, then the management app talks to the qemu control interface like it does when it adds new devices on the fly. A problem with the management app doing it is to handle the RLIMIT_NPROC review comment, this patchset: https://lore.kernel.org/all/20211007214448.6282-1-michael.christie@xxxxxxxxxx/ basically has the kernel do a clone() from the caller's context. So adding a worker is like doing the VHOST_SET_OWNER ioctl where it still has to be done from a process you can inherit values like the mm, cgroups, and now RLIMITs. >> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h >> index f7f6a3a28977..af654e3cef0e 100644 >> --- a/include/uapi/linux/vhost_types.h >> +++ b/include/uapi/linux/vhost_types.h >> @@ -47,6 +47,18 @@ struct vhost_vring_addr { >> __u64 log_guest_addr; >> }; >> +#define VHOST_VRING_NEW_WORKER -1 > > > Do we need VHOST_VRING_FREE_WORKER? And I wonder if using dedicated ioctls are better: > > VHOST_VRING_NEW/FREE_WORKER > VHOST_VRING_ATTACH_WORKER We didn't need a free worker, because the kernel handles it for userspace. I tried to make it easy for userspace because in some cases it may not be able to do syscalls like close on the device. For example if qemu crashes or for vhost-scsi we don't do an explicit close during VM shutdown. So we start off with the default worker thread that's used by all vqs like we do today. Userspace can then override it by creating a new worker. That also unbinds/ detaches the existing worker and does a put on the workers refcount. We also do a put on the worker when we stop using it during device shutdown/closure/release. When the worker's refcount goes to zero the kernel deletes it. I think separating the calls could be helpful though.