Am 11.01.22 um 16:35 schrieb Mark Rutland:
Several architectures have latent bugs around guest entry/exit, most
notably:
1) Several architectures enable interrupts between guest_enter() and
guest_exit(). As this period is an RCU extended quiescent state (EQS) this
is unsound unless the irq entry code explicitly wakes RCU, which most
architectures only do for entry from usersapce or idle.
I believe this affects: arm64, riscv, s390
I am not sure about powerpc.
2) Several architectures permit instrumentation of code between
guest_enter() and guest_exit(), e.g. KASAN, KCOV, KCSAN, etc. As
instrumentation may directly o indirectly use RCU, this has the same
problems as with interrupts.
I believe this affects: arm64, mips, powerpc, riscv, s390
3) Several architectures do not inform lockdep and tracing that
interrupts are enabled during the execution of the guest, or do so in
an incorrect order. Generally
this means that logs will report IRQs being masked for much longer
than is actually the case, which is not ideal for debugging. I don't
know whether this affects the correctness of lockdep.
I believe this affects: arm64, mips, powerpc, riscv, s390
This was previously fixed for x86 specifically in a series of commits:
87fa7f3e98a1310e ("x86/kvm: Move context tracking where it belongs")
0642391e2139a2c1 ("x86/kvm/vmx: Add hardirq tracing to guest enter/exit")
9fc975e9efd03e57 ("x86/kvm/svm: Add hardirq tracing on guest enter/exit")
3ebccdf373c21d86 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text")
135961e0a7d555fc ("x86/kvm/svm: Move guest enter/exit into .noinstr.text")
160457140187c5fb ("KVM: x86: Defer vtime accounting 'til after IRQ handling")
bc908e091b326467 ("KVM: x86: Consolidate guest enter/exit logic to common helpers")
But other architectures were left broken, and the infrastructure for
handling this correctly is x86-specific.
This series introduces generic helper functions which can be used to
handle the problems above, and migrates architectures over to these,
fixing the latent issues.
I wasn't able to figure my way around powerpc and s390, so I have not
I think 2 later patches have moved the guest_enter/exit a bit out.
Does this make the s390 code clearer?
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 577f1ead6a51..5859207c2cc0 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4145,10 +4145,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
* As PF_VCPU will be used in fault handler, between
* guest_enter and guest_exit should be no uaccess.
*/
- local_irq_disable();
- guest_enter_irqoff();
- __disable_cpu_timer_accounting(vcpu);
- local_irq_enable();
if (kvm_s390_pv_cpu_is_protected(vcpu)) {
memcpy(sie_page->pv_grregs,
vcpu->run->s.regs.gprs,
@@ -4156,8 +4152,16 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
}
if (test_cpu_flag(CIF_FPU))
load_fpu_regs();
+ local_irq_disable();
+ __disable_cpu_timer_accounting(vcpu);
+ guest_enter_irqoff();
+ local_irq_enable();
exit_reason = sie64a(vcpu->arch.sie_block,
vcpu->run->s.regs.gprs);
+ local_irq_disable();
+ guest_exit_irqoff();
+ __enable_cpu_timer_accounting(vcpu);
+ local_irq_enable();
if (kvm_s390_pv_cpu_is_protected(vcpu)) {
memcpy(vcpu->run->s.regs.gprs,
sie_page->pv_grregs,
@@ -4173,10 +4177,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
vcpu->arch.sie_block->gpsw.mask &= ~PSW_INT_MASK;
}
}
- local_irq_disable();
- __enable_cpu_timer_accounting(vcpu);
- guest_exit_irqoff();
- local_irq_enable();
vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
rc = vcpu_post_run(vcpu, exit_reason);