On 11/17/20 10:40 AM, Stefan Hajnoczi wrote: > On Thu, Nov 12, 2020 at 05:18:59PM -0600, Mike Christie wrote: >> The following kernel patches were made over Michael's vhost branch: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/log/?h=vhost >> >> and the vhost-scsi bug fix patchset: >> >> https://lore.kernel.org/linux-scsi/20201112170008.GB1555653@stefanha-x1.localdomain/T/#t >> >> And the qemu patch was made over the qemu master branch. >> >> vhost-scsi currently supports multiple queues with the num_queues >> setting, but we end up with a setup where the guest's scsi/block >> layer can do a queue per vCPU and the layers below vhost can do >> a queue per CPU. vhost-scsi will then do a num_queue virtqueues, >> but all IO gets set on and completed on a single vhost-scsi thread. >> After 2 - 4 vqs this becomes a bottleneck. >> >> This patchset allows us to create a worker thread per IO vq, so we >> can better utilize multiple CPUs with the multiple queues. It >> implments Jason's suggestion to create the initial worker like >> normal, then create the extra workers for IO vqs with the >> VHOST_SET_VRING_ENABLE ioctl command added in this patchset. > > How does userspace find out the tids and set their CPU affinity? > When we create the worker thread we add it to the device owner's cgroup, so we end up inheriting those settings like affinity. However, are you more asking about finer control like if the guest is doing mq, and the mq hw queue is bound to cpu0, it would perform better if we could bind vhost vq's worker thread to cpu0? I think the problem might is if you are in the cgroup then we can't set a specific threads CPU affinity to just one specific CPU. So you can either do cgroups or not. > What is the meaning of the new VHOST_SET_VRING_ENABLE ioctl? It doesn't > really "enable" or "disable" the vq, requests are processed regardless. > Yeah, I agree. The problem I've mentioned before is: 1. For net and vsock, it's not useful because the vqs are hard coded in the kernel and userspace, so you can't disable a vq and you never need to enable one. 2. vdpa has it's own enable ioctl. 3. For scsi, because we already are doing multiple vqs based on the num_queues value, we have to have some sort of compat support and code to detect if userspace is even going to send the new ioctl. In this patchset, compat just meant enable/disable the extra functionality of extra worker threads for a vq. We will still use the vq if userspace set it up. > The purpose of the ioctl isn't clear to me because the kernel could > automatically create 1 thread per vq without a new ioctl. On the other > hand, if userspace is supposed to control worker threads then a > different interface would be more powerful: > My preference has been: 1. If we were to ditch cgroups, then add a new interface that would allow us to bind threads to a specific CPU, so that it lines up with the guest's mq to CPU mapping. 2. If we continue with cgroups then I think just creating the worker threads from vhost_scsi_set_endpoint is best, because that is the point we do the other final vq setup ops vhost_vq_set_backend and vhost_vq_init_access. For option number 2 it would be simple. Instead of the vring enable patches: [PATCH 08/10] vhost: move msg_handler to new ops struct [PATCH 09/10] vhost: add VHOST_SET_VRING_ENABLE support [PATCH 10/10] vhost-scsi: create a woker per IO vq and [PATCH 1/1] qemu vhost scsi: add VHOST_SET_VRING_ENABLE support we could do this patch like I had done in previous versions: >From bcc4c29c28daf04679ce6566d06845b9e1b31eb4 Mon Sep 17 00:00:00 2001 From: Mike Christie <michael.christie@xxxxxxxxxx> Date: Wed, 11 Nov 2020 22:50:56 -0600 Subject: vhost scsi: multiple worker support This patch creates a worker per IO vq to fix an issue where after 2 vqs and/or multple luns the single worker thread becomes a bottleneck due to the multiple queues/luns trying to execute/ complete their IO on the same thread/CPU. This patch allows us to better match the guest and lower levels multiqueue setups. Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx> --- drivers/vhost/scsi.c | 41 ++++++++++++++++++++++++++++++++--------- 1 file changed, 32 insertions(+), 9 deletions(-) diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 44c108a..2c119d3 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -1640,9 +1640,18 @@ static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds) vq = &vs->vqs[i].vq; if (!vhost_vq_is_setup(vq)) continue; + /* + * For compat, we have the evt, ctl and first IO vq + * share worker0 like is setup by default. Additional + * vqs get their own worker. + */ + if (i > VHOST_SCSI_VQ_IO) { + if (vhost_vq_worker_add(&vs->dev, vq)) + goto cleanup_vq; + } if (vhost_scsi_setup_vq_cmds(vq, vq->num)) - goto destroy_vq_cmds; + goto cleanup_vq; } for (i = 0; i < VHOST_SCSI_MAX_VQ; i++) { @@ -1666,10 +1675,14 @@ static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds) vs->vs_tpg = vs_tpg; goto out; -destroy_vq_cmds: - for (i--; i >= VHOST_SCSI_VQ_IO; i--) { - if (!vhost_vq_get_backend(&vs->vqs[i].vq)) - vhost_scsi_destroy_vq_cmds(&vs->vqs[i].vq); +cleanup_vq: + for (; i >= VHOST_SCSI_VQ_IO; i--) { + if (vhost_vq_get_backend(&vs->vqs[i].vq)) + continue; + + if (i > VHOST_SCSI_VQ_IO) + vhost_vq_worker_remove(&vs->dev, &vs->vqs[i].vq); + vhost_scsi_destroy_vq_cmds(&vs->vqs[i].vq); } undepend: for (i = 0; i < VHOST_SCSI_MAX_TARGET; i++) { @@ -1752,14 +1765,24 @@ static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds) mutex_lock(&vq->mutex); vhost_vq_set_backend(vq, NULL); mutex_unlock(&vq->mutex); + } + vhost_scsi_flush(vs); + + for (i = VHOST_SCSI_VQ_IO; i < VHOST_SCSI_MAX_VQ; i++) { + vq = &vs->vqs[i].vq; + if (!vhost_vq_is_setup(vq)) + continue; /* - * Make sure cmds are not running before tearing them - * down. - */ - vhost_scsi_flush(vs); + * We only remove the extra workers we created in case + * this is for a reboot. The default worker will be + * removed at dev cleanup. + */ + if (i > VHOST_SCSI_VQ_IO) + vhost_vq_worker_remove(&vs->dev, vq); vhost_scsi_destroy_vq_cmds(vq); } } + /* * Act as synchronize_rcu to make sure access to * old vs->vs_tpg is finished. -- 1.8.3.1