On Thu, Nov 19, 2020 at 4:13 PM Mike Christie <michael.christie@xxxxxxxxxx> wrote: > > On 11/19/20 8:46 AM, Michael S. Tsirkin wrote: > > On Wed, Nov 18, 2020 at 11:31:17AM +0000, Stefan Hajnoczi wrote: > >>> My preference has been: > >>> > >>> 1. If we were to ditch cgroups, then add a new interface that would allow > >>> us to bind threads to a specific CPU, so that it lines up with the guest's > >>> mq to CPU mapping. > >> > >> A 1:1 vCPU/vq->CPU mapping isn't desirable in all cases. > >> > >> The CPU affinity is a userspace policy decision. The host kernel should > >> provide a mechanism but not the policy. That way userspace can decide > >> which workers are shared by multiple vqs and on which physical CPUs they > >> should run. > > > > So if we let userspace dictate the threading policy then I think binding > > vqs to userspace threads and running there makes the most sense, > > no need to create the threads. > > > > Just to make sure I am on the same page, in one of the first postings of > this set at the bottom of the mail: > > https://www.spinics.net/lists/linux-scsi/msg148322.html > > I asked about a new interface and had done something more like what > Stefan posted: > > struct vhost_vq_worker_info { > /* > * The pid of an existing vhost worker that this vq will be > * assigned to. When pid is 0 the virtqueue is assigned to the > * default vhost worker. When pid is -1 a new worker thread is > * created for this virtqueue. When pid is -2 the virtqueue's > * worker thread is unchanged. > * > * If a vhost worker no longer has any virtqueues assigned to it > * then it will terminate. > * > * The pid of the vhost worker is stored to this field when the > * ioctl completes successfully. Use pid -2 to query the current > * vhost worker pid. > */ > __kernel_pid_t pid; /* in/out */ > > /* The virtqueue index*/ > unsigned int vq_idx; /* in */ > }; > > This approach is simple and it allowed me to have userspace map queues > and threads optimally for our setups. > > Note: Stefan, in response to your previous comment, I am just using my > 1:1 mapping as an example and would make it configurable from userspace. > > In the email above are you guys suggesting to execute the SCSI/vhost > requests in userspace? We should not do that because: > > 1. It negates part of what makes vhost fast where we do not have to kick > out to userspace then back to the kernel. > > 2. It's not doable or becomes a crazy mess because vhost-scsi is tied to > the scsi/target layer in the kernel. You can't process the scsi command > in userspace since the scsi state machine and all its configuration info > is in the kernel's scsi/target layer. > > For example, I was just the maintainer of the target_core_user module > that hooks into LIO/target on the backend (vhost-scsi hooks in on the > front end) and passes commands to userspace and there we have a > semi-shadow state machine. It gets nasty to try and maintain/sync state > between lio/target core in the kernel and in userspace. We also see the > perf loss I mentioned in #1. No, if I understand Michael correctly he has suggested a different approach. My suggestion was that the kernel continues to manage the worker threads but an ioctl allows userspace to control the policy. I think Michael is saying that the kernel shouldn't manage/create threads. Userspace should create threads and then invoke an ioctl from those threads. The ioctl will call into the vhost driver where it will execute something similar to vhost_worker(). So this ioctl will block while the kernel is using the thread to process vqs. What isn't clear to me is how to tell the kernel which vqs are processed by a thread. We could try to pass that information into the ioctl. I'm not sure what the cleanest solution is here. Maybe something like: struct vhost_run_worker_info { struct timespec *timeout; sigset_t *sigmask; /* List of virtqueues to process */ unsigned nvqs; unsigned vqs[]; }; /* This blocks until the timeout is reached, a signal is received, or the vhost device is destroyed */ int ret = ioctl(vhost_fd, VHOST_RUN_WORKER, &info); As you can see, userspace isn't involved with dealing with the requests. It just acts as a thread donor to the vhost driver. We would want the VHOST_RUN_WORKER calls to be infrequent to avoid the penalty of switching into the kernel, copying in the arguments, etc. Michael: is this the kind of thing you were thinking of? Stefan