On 8/13/23 2:01 PM, Michael S. Tsirkin wrote: > On Fri, Aug 11, 2023 at 01:51:36PM -0500, Mike Christie wrote: >> On 8/10/23 1:57 PM, Michael S. Tsirkin wrote: >>> On Sat, Jul 22, 2023 at 11:03:29PM -0500, michael.christie@xxxxxxxxxx wrote: >>>> On 7/20/23 8:06 AM, Michael S. Tsirkin wrote: >>>>> On Thu, Feb 02, 2023 at 05:25:17PM -0600, Mike Christie wrote: >>>>>> For vhost workers we use the kthread API which inherit's its values from >>>>>> and checks against the kthreadd thread. This results in the wrong RLIMITs >>>>>> being checked, so while tools like libvirt try to control the number of >>>>>> threads based on the nproc rlimit setting we can end up creating more >>>>>> threads than the user wanted. >>>>>> >>>>>> This patch has us use the vhost_task helpers which will inherit its >>>>>> values/checks from the thread that owns the device similar to if we did >>>>>> a clone in userspace. The vhost threads will now be counted in the nproc >>>>>> rlimits. And we get features like cgroups and mm sharing automatically, >>>>>> so we can remove those calls. >>>>>> >>>>>> Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx> >>>>>> Acked-by: Michael S. Tsirkin <mst@xxxxxxxxxx> >>>>> >>>>> >>>>> Hi Mike, >>>>> So this seems to have caused a measureable regression in networking >>>>> performance (about 30%). Take a look here, and there's a zip file >>>>> with detailed measuraments attached: >>>>> >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=2222603 >>>>> >>>>> >>>>> Could you take a look please? >>>>> You can also ask reporter questions there assuming you >>>>> have or can create a (free) account. >>>>> >>>> >>>> Sorry for the late reply. I just got home from vacation. >>>> >>>> The account creation link seems to be down. I keep getting a >>>> "unable to establish SMTP connection to bz-exim-prod port 25 " error. >>>> >>>> Can you give me Quan's email? >>>> >>>> I think I can replicate the problem. I just need some extra info from Quan: >>>> >>>> 1. Just double check that they are using RHEL 9 on the host running the VMs. >>>> 2. The kernel config >>>> 3. Any tuning that was done. Is tuned running in guest and/or host running the >>>> VMs and what profile is being used in each. >>>> 4. Number of vCPUs and virtqueues being used. >>>> 5. Can they dump the contents of: >>>> >>>> /sys/kernel/debug/sched >>>> >>>> and >>>> >>>> sysctl -a >>>> >>>> on the host running the VMs. >>>> >>>> 6. With the 6.4 kernel, can they also run a quick test and tell me if they set >>>> the scheduler to batch: >>>> >>>> ps -T -o comm,pid,tid $QEMU_THREAD >>>> >>>> then for each vhost thread do: >>>> >>>> chrt -b -p 0 $VHOST_THREAD >>>> >>>> Does that end up increasing perf? When I do this I see throughput go up by >>>> around 50% vs 6.3 when sessions was 16 or more (16 was the number of vCPUs >>>> and virtqueues per net device in the VM). Note that I'm not saying that is a fix. >>>> It's just a difference I noticed when running some other tests. >>> >>> >>> Mike I'm unsure what to do at this point. Regressions are not nice >>> but if the kernel is released with the new userspace api we won't >>> be able to revert. So what's the plan? >>> >> >> I'm sort of stumped. I still can't replicate the problem out of the box. 6.3 and >> 6.4 perform the same for me. I've tried your setup and settings and with different >> combos of using things like tuned and irqbalance. >> >> I can sort of force the issue. In 6.4, the vhost thread inherits it's settings >> from the parent thread. In 6.3, the vhost thread inherits from kthreadd and we >> would then reset the sched settings. So in 6.4 if I just tune the parent differently >> I can cause different performance. If we want the 6.3 behavior we can do the patch >> below. >> >> However, I don't think you guys are hitting this because you are just running >> qemu from the normal shell and were not doing anything fancy with the sched >> settings. >> >> >> diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c >> index da35e5b7f047..f2c2638d1106 100644 >> --- a/kernel/vhost_task.c >> +++ b/kernel/vhost_task.c >> @@ -2,6 +2,7 @@ >> /* >> * Copyright (C) 2021 Oracle Corporation >> */ >> +#include <uapi/linux/sched/types.h> >> #include <linux/slab.h> >> #include <linux/completion.h> >> #include <linux/sched/task.h> >> @@ -22,9 +23,16 @@ struct vhost_task { >> >> static int vhost_task_fn(void *data) >> { >> + static const struct sched_param param = { .sched_priority = 0 }; >> struct vhost_task *vtsk = data; >> bool dead = false; >> >> + /* >> + * Don't inherit the parent's sched info, so we maintain compat from >> + * when we used kthreads and it reset this info. >> + */ >> + sched_setscheduler_nocheck(current, SCHED_NORMAL, ¶m); >> + >> for (;;) { >> bool did_work; >> >> >> > > yes seems unlikely, still, attach this to bugzilla so it can be > tested? > > and, what will help you debug? any traces to enable? I added the patch and asked for a perf trace. > > Also wasn't there another issue with a non standard config? > Maybe if we fix that it will by chance fix this one too? > It was when CONFIG_RT_GROUP_SCHED was enabled in the kernel config then I would see a large drop in IOPs/throughput. In the current 6.5-rc6 I don't see the problem anymore. I haven't had a chance to narrow down what fixed it. _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization