> I would like to know what is introducing this latency. Is it related to the fact > that the CPU running KVM periodically enters IDLE mode? Why do we > have this behavior in 5.15 and not in 4.19? This is easily reproducible when using vcpu-pinning on a guest, I think I got to the bottom of this. Buckle up, this is a bit tricky :) I believe these latencies come from hitting RT-Throttling in your host. This can be verified in your kernel logs where you should find "RT throttling activated" (printed only once). By default RT-throttling prevents an RT process to consume more than 95% of a cpu runtime (actually 950000us/1000000us as you can check in /proc/sys/kernel/sched_rt_period_us and /proc/sys/kernel/sched_rt_runtime_us). Why is throttling activated? Because your guest vCPU processes are above 95% utilisation, which may seem weird since the cyclictest process is not consuming so much... what's happening? Indeed the cpu usage from inside your guest is small, why is it so high from the host perspective? The root cause is actually the default value of kvm halt_pool_ns (200000ns, matching your cyclictest interval value 200us), and because you are probably using vcpu-pinning (or any other mecanism/constraint that makes the vcpus running on the exact same number of cpu --> the vcpu is constrained to a cpu). >From https://www.kernel.org/doc/Documentation/virtual/kvm/halt-polling.txt: "The KVM halt polling system provides a feature within KVM whereby the latency of a guest can, under some circumstances, be reduced by polling in the host for some time period after the guest has elected to no longer run by cedeing." When the cyclictest interval is larger than halt_poll_ns, then the polling does not help (it's never interrupted) and the growing/shrinking algorithm makes the interval go to 0 ("In the event that the total block time was greater than the global max polling interval then the host will never poll for long enough (limited by the global max) to wakeup during the polling interval so it may as well be shrunk in order to avoid pointless polling."). But when the cyclictest interval starts becoming smaller than halt_poll_ns, then a wakeup source is received within polling... "During polling if a wakeup source is received within the halt polling interval, the interval is left unchanged.", and so polling continues with the same value, again and again, which puts us is this situation: "Care should be taken when setting the halt_poll_ns module parameter as a large value has the potential to drive the cpu usage to 100% on a machine which would be almost entirely idle otherwise. This is because even if a guest has wakeups during which very little work is done and which are quite far apart, if the period is shorter than the global max polling interval (halt_poll_ns) then the host will always poll for the entire block time and thus cpu utilisation will go to 100%." To sum up, when you use cyclictest with an interval <= halt_poll_ns, and when the vCPU is constrained, the vCPU and the associated CPU will naturally hit 100%, which will make your host throttle the RT process (by default 50ms every second, which is precisely the kind of latencies you observe). If the vCPU is not constrained to a CPU, I guess the high load from the RT process is more easily migrated to different cpus by the scheduler, so even though the process will show 100% usage total, no cpu will hit RT-throttling. I don't know what makes the process migrate in that case, if someone knows please feel free to respond. What's weird is that you say you still encounter this situation with halt_poll_ns lowered to 50000ns and that's not supposed to happen. My guess is that you may not have restarted your guest so that it catches on the new value. My experience (I didn't check the kvm code) is that this setting is determined per guest, at guest startup. Now for the last part of the problem, why you didn't hit this problem with kernel v4.19. With kernel v4.19 you didn't hit RT-throttling, even thought the vcpu was using 100% cpu runtime (you can check in your kernel logs). The reason is that on kernel < 5.10, the scheduler comes with RT runtime sharing enabled (RT_RUNTIME_SHARE is true by default). With kernel v5.10, RT_RUNTIME_SHARE is disabled by default. https://lore.kernel.org/lkml/c596a06773658d976fb839e02843a459ed4c2edf.1479204252.git.bristot@xxxxxxxxxx/ "RT_RUNTIME_SHARE sched feature enables the sharing of rt_runtime between CPUs, allowing a CPU to run a real-time task up to 100% of the time while leaving more space for non-real-time tasks to run on the CPU that lend rt_runtime" My understanding it that with RT_RUNTIME_SHARE enabled (on your 4.19 kernel), the vcpu needing 100% runtime would "borrow" the extra 5% to another cpu runtime, and thus would not get throttled by RT-throttling. To conclude, you have 5 ways not to have this problem (the first 3 are just to not be in this very specific situation): 1) increase your cyclictest interval, so that you don't hit the halt polling problem at default value 200us, then your vpcu won't use 100% of the cpu 2) lower your halt polling max interval, so that your cyclictest with a 200us does not hit the lowered halt polling interval (you can also disable halt polling altogother, by setting 0 to halt_poll_ns), your vcpu won't use 100% of the cpu. This is what you have tested and should have solved the issue. Be sure you stop the VM and start it again when you change halt_poll_ns. 3) do not use vcpu-pinning so that the high cpu load is dispatched on different cpus (if you have more cpus than vcpus of course...) 4) disable RT-Throttling (echo -1 > /proc/sys/kernel/sched_rt_period_us): even though the vcpu uses 100% of its cpu, it won't be throttled so you won't get the latencies 5) enable RT_RUNTIME_SHARE (echo RT_RUNTIME_SHARE > /sys/kernel/debug/sched/features) so that your are exactly in the kernel <5.10 situation (no throttling because of sharing). Since you are compiling a custom kernel, note that you need CONFIG_SCHED_DEBUG to play with scheduler features. I personally would keep using vcpu-pinning, but avoid testing with a cyclictest interval <= halt_poll_ns since it's kind of a particular situation which will make you cpu go crazy, which is something you probably do not want anyway. And since you are using isolated cpus for your RT workloads, I would also go with disabling RT-Throttling, which is kind of a standard best practice when trying to achieve best latencies (see redhat tuned profile for realtime for example) https://github.com/redhat-performance/tuned/blob/9fa66f19de78f31009fdaf3968a6d75686c190bc/profiles/realtime/tuned.conf#L44