On 11/19/20 8:46 AM, Michael S. Tsirkin wrote:
On Wed, Nov 18, 2020 at 11:31:17AM +0000, Stefan Hajnoczi wrote:
My preference has been:
1. If we were to ditch cgroups, then add a new interface that would allow
us to bind threads to a specific CPU, so that it lines up with the guest's
mq to CPU mapping.
A 1:1 vCPU/vq->CPU mapping isn't desirable in all cases.
The CPU affinity is a userspace policy decision. The host kernel should
provide a mechanism but not the policy. That way userspace can decide
which workers are shared by multiple vqs and on which physical CPUs they
should run.
So if we let userspace dictate the threading policy then I think binding
vqs to userspace threads and running there makes the most sense,
no need to create the threads.
Just to make sure I am on the same page, in one of the first postings of
this set at the bottom of the mail:
https://www.spinics.net/lists/linux-scsi/msg148322.html
I asked about a new interface and had done something more like what
Stefan posted:
struct vhost_vq_worker_info {
/*
* The pid of an existing vhost worker that this vq will be
* assigned to. When pid is 0 the virtqueue is assigned to the
* default vhost worker. When pid is -1 a new worker thread is
* created for this virtqueue. When pid is -2 the virtqueue's
* worker thread is unchanged.
*
* If a vhost worker no longer has any virtqueues assigned to it
* then it will terminate.
*
* The pid of the vhost worker is stored to this field when the
* ioctl completes successfully. Use pid -2 to query the current
* vhost worker pid.
*/
__kernel_pid_t pid; /* in/out */
/* The virtqueue index*/
unsigned int vq_idx; /* in */
};
This approach is simple and it allowed me to have userspace map queues
and threads optimally for our setups.
Note: Stefan, in response to your previous comment, I am just using my
1:1 mapping as an example and would make it configurable from userspace.
In the email above are you guys suggesting to execute the SCSI/vhost
requests in userspace? We should not do that because:
1. It negates part of what makes vhost fast where we do not have to kick
out to userspace then back to the kernel.
2. It's not doable or becomes a crazy mess because vhost-scsi is tied to
the scsi/target layer in the kernel. You can't process the scsi command
in userspace since the scsi state machine and all its configuration info
is in the kernel's scsi/target layer.
For example, I was just the maintainer of the target_core_user module
that hooks into LIO/target on the backend (vhost-scsi hooks in on the
front end) and passes commands to userspace and there we have a
semi-shadow state machine. It gets nasty to try and maintain/sync state
between lio/target core in the kernel and in userspace. We also see the
perf loss I mentioned in #1.