Re: [PATCH v6 11/11] vhost: allow userspace to create workers

Jason Wang <jasowang@xxxxxxxxxx> · Mon, 10 Apr 2023 15:04:49 +0800

On Wed, Apr 5, 2023 at 7:08 AM Mike Christie
<michael.christie@xxxxxxxxxx> wrote:
>
> On 4/4/23 3:00 AM, Jason Wang wrote:
> >>
> >> -static void vhost_worker_free(struct vhost_dev *dev)
> >> +static void vhost_worker_put(struct vhost_dev *dev, struct vhost_worker *worker)
> >>  {
> >> -       struct vhost_worker *worker = dev->worker;
> >> -
> >>         if (!worker)
> >>                 return;
> >>
> >> -       dev->worker = NULL;
> >> +       if (!refcount_dec_and_test(&worker->refcount))
> >> +               return;
> >> +
> >>         WARN_ON(!llist_empty(&worker->work_list));
> >>         vhost_task_stop(worker->vtsk);
> >>         kfree(worker);
> >>  }
> >>
> >> +static void vhost_vq_detach_worker(struct vhost_virtqueue *vq)
> >> +{
> >> +       if (vq->worker)
> >
> > What happens to the pending work that queues for the old worker?
>
> I didn't think there would be works queued at this time. I see your comment
> below about my assumption about the backend being set being wrong. Will
> discuss down there.
>
>
> >>
> >> +/* Caller must have device and virtqueue mutex */
> >> +static void __vhost_vq_attach_worker(struct vhost_virtqueue *vq,
> >> +                                    struct vhost_worker *worker)
> >> +{
> >> +       refcount_inc(&worker->refcount);
> >> +       vhost_vq_detach_worker(vq);())
> >> +       vq->worker = worker;
> >
> > What happens if there's a pending flush in a specific device (e.g in
> > vhost_scsi_tmf_resp_work())?
>
> We wouldn't hit that specific case where we are running the above code and
> vhost_scsi_tmf_resp_work.
>
> Either:
>
> 1. The backend is NULL and has never been set, and so we would never have
> called vhost_scsi_tmf_resp_work, because it can only be called after the
> backend is set.
>
> 2. If the backed has been set and vhost_scsi_tmf_resp_work has
> run or is running, then we when we would not have called __vhost_vq_attach_worker
> from vhost_vq_attach_worker because it would see vhost_vq_get_backend
> returning a non-NULL value.
>
> If vhost_scsi later sets the backend to NULL, then vhost_scsi_clear_endpoint
> will have made sure the flush has completed when the clear function returns.
> It does that with the device mutex so when we run __vhost_vq_attach_worker
> It will only see a vq/worker with no flushes in progress.

Ok.

>
> For the general case of can we be doing a vhost_dev_flush/vhost_work_flush_on
> and __vhost_vq_attach_worker, then I thought we are ok as well because I
> thought we have to currently have the device mutex when we flush so those can't
> race with ioctl calls to vhost_vq_attach_worker since we hold the dev mutex
> during that ioctls.

I'm not sure I understand here, but we can't assume the user of
vhost_work_queue() is called in the ioctl context. See
vhost_zerocopy_callback(). But since you want to limit the call before
set_backend, another question comes, do we really need the dynamic
attaching/creating in this case? How about a simple one ioctl that is
used to build the queue to workers mapping instead?

> Or we call flush from the file_operations release function
> so the device is closed and can't race with ioctls.
>
> >
> > Does this mean we need to hold vq mutex when doing the flush? (which
> > seems not the case of vhost_scsi_tmf_resp_work()).
> >
> >> +}
> >> +
> >> +/* Caller must have device and virtqueue mutex */
> >> +static int vhost_vq_attach_worker(struct vhost_virtqueue *vq,
> >> +                                 struct vhost_vring_worker *info)
> >> +{
> >> +       unsigned long index = info->worker_id;
> >> +       struct vhost_dev *dev = vq->dev;
> >> +       struct vhost_worker *worker;
> >> +
> >> +       if (!dev->use_worker)
> >> +               return -EINVAL;
> >> +
> >> +       /*
> >> +        * We don't support setting a worker on an active vq to make flushing
> >> +        * and removal simple.
> >> +        */
> >> +       if (vhost_vq_get_backend(vq))
> >> +               return -EBUSY;
> >
> > This assumes the worker won't be started until the backend is set
> > which is not true.
>
> I can see how we get flushes before setting the backend, but I thought we are
> ok because we have the device mutex held.
>
> What are the other possible cases? Is one case we could get a
> VHOST_SET_VRING_KICK before setting the backend and so vhost_poll_start starts
> to allow vhost_poll_queue before the backend is set?

Yes, and it's not necessarily the kick, the networking core could wake
up workers before set_backend.

> Is there any thing else?

Haven't found time to do this, but what I want to say is, if we want
this assumption, we need to document it and change the devices that
are affected by this change.

>
> I'll fix this issue.
>
>
> >> +
> >> +/* Caller must have device mutex */
> >> +static int vhost_free_worker(struct vhost_dev *dev,
> >> +                            struct vhost_worker_state *info)
> >> +{
> >> +       unsigned long index = info->worker_id;
> >> +       struct vhost_worker *worker;
> >> +
> >> +       if (!dev->use_worker)
> >> +               return -EINVAL;
> >> +
> >> +       worker = xa_find(&dev->worker_xa, &index, UINT_MAX, XA_PRESENT);
> >
> > So we use int for worker_id which conflicts with UINT_MAX here?
>
> I switched from idr in the last versions to xa last second and added this mistake.
> Will fix.
>
>
> >
> > struct vhost_worker_state {
> >         /*
> >          * For VHOST_NEW_WORKER the kernel will return the new vhost_worker id.
> >          * For VHOST_FREE_WORKER this must be set to the id of the vhost_worker
> >          * to free.
> >          */
> >         int worker_id;
> > };
> >
> > A side effect of using xa_index directly is that userspace can guess
> > the xa_index of the default worker and free it here, is this intended?
> I don't need the functionality. It was only something that I didn't
> think mattered much, because you can only free the worker if it's not in
> use by any virtqueues, so I didn't add any special code to handle it.
> The user would have had to do an ATTACH to the default worker and replace
> it then it could free it, so it works like the other workers.

It's about the robustness of the uAPI:

struct vhost_worker_state {
        /*
         * For VHOST_NEW_WORKER the kernel will return the new vhost_worker id.
         * For VHOST_FREE_WORKER this must be set to the id of the vhost_worker
         * to free.
         */
        int worker_id;
};

It looks like the workder_id is the one userspace gets from
VHOST_NEW_WORKER. If yes, we should hide the default worker.

>
> > Should we hide the default worker from xa?
>
> I can change it if you are worried about future problems.
>
>
> >> -       default:
> >> +       case VHOST_ATTACH_VRING_WORKER:
> >> +               if (copy_from_user(&w, argp, sizeof(w))) {
> >> +                       r = -EFAULT;
> >> +                       break;
> >> +               }
> >> +               r = vhost_vq_attach_worker(vq, &w);
> >> +               if (!r && copy_to_user(argp, &w, sizeof(w)))
> >> +                       r = -EFAULT;
> >> +               break;
> >
> > It's a little odd that we have new and attach but only a free.
>
> Do you mean we would also want a detach? I've been debating that.
> I'm not sure what the user wanted though.
>
> Would it leave the virtqueue with no worker? Or, does it pick the first
> worker it finds? If there is no worker because we did the former or
> because userspace detached all of them, then do we just drop work that
> gets queued?
>
> It seemed like a user would never want to drop work, so I made it so
> you can only tell it to use new workers (attach which I guess is more
> like a swap worker)

swap and free old worker indeed?

static void vhost_vq_detach_worker(struct vhost_virtqueue *vq)
{
        if (vq->worker)
                vhost_worker_put(vq->dev, vq->worker);
        vq->worker = NULL;
}

That makes me think under which case we should use VHOST_FREE_WORKER?

Thanks

> so we always have a worker and I don't have to
> worry about dropping work.
>

_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization