Hi Mike, On Fri, Apr 28, 2023 at 11:35:20AM -0500, michael.christie@xxxxxxxxxx wrote: > The following patches were built over Linux's tree. They allow us to > support multiple vhost workers tasks per device. The design is a modified > version of Stefan's original idea where userspace has the kernel create a > worker and we pass back the pid. In this version instead of passing the > pid between user/kernel space we use a worker_id which is just an integer > managed by the vhost driver and we allow userspace to create and free > workers and then attach them to virtqueues at setup time. > > All review comments from the past reviews should be handled. If I didn't > reply to a review comment, I agreed with the comment and should have > handled it in this posting. Let me know if I missed one. > > Results: > -------- > > fio jobs 1 2 4 8 12 16 > ---------------------------------------------------------- > 1 worker 160k 488k - - - - > worker per vq 160k 310k 620k 1300k 1836k 2326k > > > Notes: > 0. This used a simple fio command: > > fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k \ > --ioengine=libaio --iodepth=128 --numjobs=$JOBS_ABOVE > > and I used a VM with 16 vCPUs and 16 virtqueues. > > 1. The patches were tested with LIO's emulate_pr=0 which drops the > LIO PR lock use. This was a bottleneck at around 12 vqs/jobs. > > 2. Because we have a hard limit of 1024 cmds, if the num jobs * iodepth > was greater than 1024, I would decrease iodepth. So 12 jobs used 85 cmds, > and 16 used 64. > > 3. The perf issue above at 2 jobs is because when we only have 1 worker > we execute more cmds per vhost_work due to all vqs funneling to one worker. > I have 2 patches that fix this. One is just submit from the worker thread > instead of kikcing off to a workqueue like how the vhost block patches do. > I'll post this after the worker support is merged. I'm also working on > patches to add back batching during lio completion and do polling on the > submission side. > > We will still want the threading patches, because if we batch at the fio > level plus use the vhost theading patches, we can see a big boost like > below. So hopefully doing it at the kernel will allow apps to just work > without having to be smart like fio. > > fio using io_uring and batching with the iodepth_batch* settings: > > fio jobs 1 2 4 8 12 16 > ------------------------------------------------------------- > 1 worker 494k 520k - - - - > worker per vq 496k 878k 1542k 2436k 2304k 2590k Could you rebase on latest rc and repost please? Thanks! > V7: > - Add more comments about assumptions. > - Drop refcounting and just use an attachment_cnt variable, so there > is no confusion about when a worker is freed. > - Do a opt-in model, because vsiock has an issue where it can queue works > before it's running and it doesn't even need multiple workers, so there > is no point in chaning the driver or core code. > - Add back get worker ioctl. > - Broke up last patches to make it easier to read. > > V6: > - Rebase against vhost_task patchset. > - Used xa instead of idr. > V5: > - Rebase against user_worker patchset. > - Rebase against flush patchset. > - Redo vhost-scsi tmf flush handling so it doesn't access vq->worker. > V4: > - fix vhost-sock VSOCK_VQ_RX use. > - name functions called directly by ioctl cmd's to match the ioctl cmd. > - break up VHOST_SET_VRING_WORKER into a new, free and attach cmd. > - document worker lifetime, and cgroup, namespace, mm, rlimit > inheritance, make it clear we currently only support sharing within the > device. > - add support to attach workers while IO is running. > - instead of passing a pid_t of the kernel thread, pass a int allocated > by the vhost layer with an idr. > > V3: > - fully convert vhost code to use vq based APIs instead of leaving it > half per dev and half per vq. > - rebase against kernel worker API. > - Drop delayed worker creation. We always create the default worker at > VHOST_SET_OWNER time. Userspace can create and bind workers after that. > > V2: > - change loop that we take a refcount to the worker in > - replaced pid == -1 with define. > - fixed tabbing/spacing coding style issue > - use hash instead of list to lookup workers. > - I dropped the patch that added an ioctl cmd to get a vq's worker's > pid. I saw we might do a generic netlink interface instead. _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization