Re: [RFC PATCH 5/8] vhost: allow userspace to bind vqs to CPUs

Jason Wang <jasowang@xxxxxxxxxx> · Tue, 8 Dec 2020 10:30:13 +0800

On 2020/12/8 上午2:31, Mike Christie wrote:
On 12/6/20 10:27 PM, Jason Wang wrote:

On 2020/12/5 上午12:32, Mike Christie wrote:
On 12/4/20 2:09 AM, Jason Wang wrote:

On 2020/12/4 下午3:56, Mike Christie wrote:
+static long vhost_vring_set_cpu(struct vhost_dev *d, struct 
vhost_virtqueue *vq,
+                void __user *argp)
+{
+    struct vhost_vring_state s;
+    int ret = 0;
+
+    if (vq->private_data)
+        return -EBUSY;
+
+    if (copy_from_user(&s, argp, sizeof s))
+        return -EFAULT;
+
+    if (s.num == -1) {
+        vq->cpu = s.num;
+        return 0;
+    }
+
+    if (s.num >= nr_cpu_ids)
+        return -EINVAL;
+
+    if (!d->ops || !d->ops->get_workqueue)
+        return -EINVAL;
+
+    if (!d->wq)
+        d->wq = d->ops->get_workqueue();
+    if (!d->wq)
+        return -EINVAL;
+
+    vq->cpu = s.num;
+    return ret;
+}

So one question here. Who is in charge of doing this set_cpu? Note 
that sched_setaffinity(2) requires CAP_SYS_NICE to work, so I 
wonder whether or not it's legal for unprivileged Qemu to do this.

I was having qemu do it when it's setting up the vqs since it had 
the info there already.

Is it normally the tool that makes calls into qemu that does the 
operations that require CAP_SYS_NICE? 

My understanding is that it only matter scheduling. And this patch 
wants to change the affinity which should check that capability.

If so, then I see the interface needs to be changed.

Actually, if I read this patch correctly it requires e.g qemu to make 
the decision instead of the management layer. This may bring some 
troubles to for e.g the libvirt emulatorpin[1] implementation.

Let me make sure I understood you.

I thought qemu would just have a new property, and users would pass 
that in like they do for the number of queues setting. Then qemu would 
pass that to the kernel. The primary user I have to support at work 
does not use libvirt based tools so I thought that was a common point 
that would work for everyone.

I think we need talk with libvirt guys to see if it works for them. My 
understanding is the scheduling should be the charge of them not qemu.

For my work use requirement, your emulatorpin and CAP_SYS_NICE comment 
then that means we want an interface that something other than qemu 
can use right? So the tools would call directly into the kernel and 
not go through qemu right?

Yes, usually qemu runs without any privilege. So could it be e.g a sysfs 
interface or other?

Thanks