On 10/22/21 11:12 AM, michael.christie@xxxxxxxxxx wrote: > On 10/22/21 5:47 AM, Michael S. Tsirkin wrote: >>> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h >>> index c998860d7bbc..e5c0669430e5 100644 >>> --- a/include/uapi/linux/vhost.h >>> +++ b/include/uapi/linux/vhost.h >>> @@ -70,6 +70,17 @@ >>> #define VHOST_VRING_BIG_ENDIAN 1 >>> #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state) >>> #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state) >>> +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER >>> + * that its virtqueues share. This allows userspace to create a vhost_worker >>> + * and map a virtqueue to it or map a virtqueue to an existing worker. >>> + * >>> + * If pid > 0 and it matches an existing vhost_worker thread it will be bound >>> + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be >>> + * created and bound to the vq. >>> + * >>> + * This must be called after VHOST_SET_OWNER and before the vq is active. >>> + */ >> >> A couple of things here: >> it's probably a good idea not to make it match pid exactly, >> if for no other reason than I'm not sure we want to >> commit this being a pid. Let's just call it an id? > > Ok. > >> And maybe byteswap it or xor with some value >> just to make sure userspace does not begin abusing it anyway. >> >> Also, interaction with pid namespace is unclear to me. >> Can you document what happens here? > > This current patchset only allows the vhost_dev owner to > create/bind workers for devices it owns, so namespace don't come I made a mistake here. The patches do restrict VHOST_SET_VRING_WORKER to the same owner like I wrote. However, it looks like we could have 2 threads with the same mm pointer so vhost_dev_check_owner returns true, but they could be in different namespaces. Even though we are not going to pass the pid_t between user/kernel space, should I add a pid namespace check when I repost the patches? > into play. If a thread from another namespace tried to create/bind > a worker we would hit the owner checks in vhost_dev_ioctl which is > done before vhost_vring_ioctl normally (for vdpa we hit the use_worker > check and fail there). > > However, with the kernel worker API changes the worker threads will > now be in the vhost dev owner's namespace and not the kthreadd/default > one, so in the future we are covered if we want to do something more > advanced. For example, I've seen people working on an API to export the > worker pids: > > https://lore.kernel.org/netdev/20210507154332.hiblsd6ot5wzwkdj@steredhat/T/ > > and in the future for interfaces that export that info we could restrict > access to root or users from the same namespace or I guess add interfaces > to allow different namespaces to see the workers and share them. > > >> No need to fix funky things like moving the fd between >> pid namespaces while also creating/destroying workers, but let's >> document it's not supported. > > Ok. I'll add a comment. >