On Fri, Jul 12, 2024 at 05:57:10PM +0200, Paolo Bonzini wrote: > On 7/11/24 01:18, Leonardo Bras wrote: > > What are your thoughts on above results? > > Anything you would suggest changing? > Hello Paolo, thanks for the feedback! > Can you run the test with a conditional on "!tick_nohz_full_cpu(vcpu->cpu)"? > > If your hunch is correct that nohz-full CPUs already avoid invoke_rcu_core() > you might get the best of both worlds. > > tick_nohz_full_cpu() is very fast when there is no nohz-full CPU, because > then it shortcuts on context_tracking_enabled() (which is just a static > key). But that would mean not noting an RCU quiescent state in guest_exit of nohz_full cpus, right? The original issue we were dealing was having invoke_rcu_core() running on nohz_full cpus, and messing up the latency of RT workloads inside the VM. While most of the invoke_rcu_core() get ignored by the nohz_full rule, there are some scenarios in which it the vcpu thread may take more than 1s between a guest_entry and the next one (VM busy), and those which did not get ignored have caused latency peaks in our tests. The main idea of this patch is to note RCU quiescent states on guest_exit at nohz_full cpus (and use rcu.patience) to avoid running invoke_rcu_core() between a guest_exit and the next guest_entry if it takes less than rcu.patience miliseconds between exit and entry, and thus avoiding the latency increase. What I tried to prove above is that it also improves non-Isolated cores as well, since rcu_core will not be running as often, saving cpu cycles that can be used by the VM. What are your thoughts on that? Thanks! Leo