[RFC PATCH v1 0/2] Avoid rcu_core() if CPU just left guest vcpu

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


I am dealing with a latency issue inside a KVM guest, which is caused by
a sched_switch to rcuc[1].

During guest entry, kernel code will signal to RCU that current CPU was on
a quiescent state, making sure no other CPU is waiting for this one.

If a vcpu just stopped running (guest_exit), and a syncronize_rcu() was
issued somewhere since guest entry, there is a chance a timer interrupt
will happen in that CPU, which will cause rcu_sched_clock_irq() to run.

rcu_sched_clock_irq() will check rcu_pending() which will return true,
and cause invoke_rcu_core() to be called, which will (in current config)
cause rcuc/N to be scheduled into the current cpu.

On rcu_pending(), I noticed we can avoid returning true (and thus invoking
rcu_core()) if the current cpu is nohz_full, and the cpu came from either
idle or userspace, since both are considered quiescent states.

Since this is also true to guest context, my idea to solve this latency
issue by avoiding rcu_core() invocation if it was running a guest vcpu.

On the other hand, I could not find a way of reliably saying the current
cpu was running a guest vcpu, so patch #1 implements a per-cpu variable
for keeping the time (jiffies) of the last guest exit.

In patch #2 I compare current time to that time, and if less than a second
has past, we just skip rcu_core() invocation, since there is a high chance
it will just go back to the guest in a moment.

What I know it's weird with this patch:
1 - Not sure if this is the best way of finding out if the cpu was
    running a guest recently.

2 - This per-cpu variable needs to get set at each guest_exit(), so it's
    overhead, even though it's supposed to be in local cache. If that's
    an issue, I would suggest having this part compiled out on 
    !CONFIG_NO_HZ_FULL, but further checking each cpu for being nohz_full
    enabled seems more expensive than just setting this out.

3 - It checks if the guest exit happened over than 1 second ago. This 1
    second value was copied from rcu_nohz_full_cpu() which checks if the
    grace period started over than a second ago. If this value is bad,
    I have no issue changing it.

4 - Even though I could detect no issue, I included linux/kvm_host.h into 
    rcu/tree_plugin.h, which is the first time it's getting included
    outside of kvm or arch code, and can be weird. An alternative would
    be to create a new header for providing data for non-kvm code.

Please provide feedback.

[1]: It uses a PREEMPT_RT kernel, with the guest cpus running on isolated,
rcu_nocbs, nohz_full cpus.

Leonardo Bras (2):
  kvm: Implement guest_exit_last_time()
  rcu: Ignore RCU in nohz_full cpus if it was running a guest recently

 include/linux/kvm_host.h | 13 +++++++++++++
 kernel/rcu/tree_plugin.h | 14 ++++++++++++++
 kernel/rcu/tree.c        |  4 +++-
 virt/kvm/kvm_main.c      |  3 +++
 4 files changed, 33 insertions(+), 1 deletion(-)

base-commit: 8d025e2092e29bfd13e56c78e22af25fac83c8ec

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux