On Mon, Mar 27, 2023 at 09:17:06PM -0500, Mike Christie wrote: > The following patches were built over linux-next which contains various > vhost patches in mst's tree and the vhost_task patchset in Christian > Brauner's tree: > > git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git > > kernel.user_worker branch: > > https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=kernel.user_worker Looks like it's going to miss this merge cycle. Hopefully in the next one. > The latter patchset handles the review comment for the patches in thread > to make sure that worker threads we create are accounted for in the parent > process's NPROC limit. The patches are scheduled to be sent to Linus for > 6.4. > > The patches in this patchset allow us to support multiple vhost workers > per device. The design is a modified version of Stefan's original idea > where userspace has the kernel create a worker and we pass back the pid. > In this version instead of passing the pid between user/kernel space we > use a worker_id which is just an integer managed by the vhost driver and > we allow userspace to create and free workers and then attach them to > virtqueues at setup time. > > All review comments from the past reviews should be handled. If I didn't > reply to a review comment, I agreed with the comment and should have > handled it in this posting. Let me know if I missed one. > > Results: > -------- > > fio jobs 1 2 4 8 12 16 > ---------------------------------------------------------- > 1 worker 160k 488k - - - - > worker per vq 160k 310k 620k 1300k 1836k 2326k > > Notes: > 0. This used a simple fio command: > > fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k \ > --ioengine=libaio --iodepth=128 --numjobs=$JOBS_ABOVE > > and I used a VM with 16 vCPUs and 16 virtqueues. > > 1. The patches were tested with LIO's emulate_pr=0 which drops the > LIO PR lock use. This was a bottleneck at around 12 vqs/jobs. > > 2. Because we have a hard limit of 1024 cmds, if the num jobs * iodepth > was greater than 1024, I would decrease iodepth. So 12 jobs used 85 cmds, > and 16 used 64. > > 3. The perf issue above at 2 jobs is because when we only have 1 worker > we execute more cmds per vhost_work due to all vqs funneling to one worker. > This results in less context switches and better batching without having to > tweak any settings. I'm working on patches to add back batching during lio > completion and do polling on the submission side. > > We will still want the threading patches, because if we batch at the fio > level plus use the vhost theading patches, we can see a big boost like > below. So hopefully doing it at the kernel will allow apps to just work > without having to be smart like fio. > > fio using io_uring and batching with the iodepth_batch* settings: > > fio jobs 1 2 4 8 12 16 > ------------------------------------------------------------- > 1 worker 494k 520k - - - - > worker per vq 496k 878k 1542k 2436k 2304k 2590k > > > V6: > - Rebase against vhost_task patchset. > - Used xa instead of idr. > V5: > - Rebase against user_worker patchset. > - Rebase against flush patchset. > - Redo vhost-scsi tmf flush handling so it doesn't access vq->worker. > V4: > - fix vhost-sock VSOCK_VQ_RX use. > - name functions called directly by ioctl cmd's to match the ioctl cmd. > - break up VHOST_SET_VRING_WORKER into a new, free and attach cmd. > - document worker lifetime, and cgroup, namespace, mm, rlimit > inheritance, make it clear we currently only support sharing within the > device. > - add support to attach workers while IO is running. > - instead of passing a pid_t of the kernel thread, pass a int allocated > by the vhost layer with an idr. > > V3: > - fully convert vhost code to use vq based APIs instead of leaving it > half per dev and half per vq. > - rebase against kernel worker API. > - Drop delayed worker creation. We always create the default worker at > VHOST_SET_OWNER time. Userspace can create and bind workers after that. > > V2: > - change loop that we take a refcount to the worker in > - replaced pid == -1 with define. > - fixed tabbing/spacing coding style issue > - use hash instead of list to lookup workers. > - I dropped the patch that added an ioctl cmd to get a vq's worker's > pid. I saw we might do a generic netlink interface instead. > _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization