Hi Marc, On 11/10/17 11:37, Marc Zyngier wrote: > On Thu, Oct 05 2017 at 8:18:11 pm BST, James Morse <james.morse@xxxxxxx> wrote: >> We expect to have firmware-first handling of RAS SErrors, with errors >> notified via an APEI method. For systems without firmware-first, add >> some minimal handling to KVM. >> >> There are two ways KVM can take an SError due to a guest, either may be a >> RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO, >> or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit. >> >> The current SError from EL2 code unmasks SError and tries to fence any >> pending SError into a single instruction window. It then leaves SError >> unmasked. >> >> With the v8.2 RAS Extensions we may take an SError for a 'corrected' >> error, but KVM is only able to handle SError from EL2 if they occur >> during this single instruction window... >> >> The RAS Extensions give us a new instruction to synchronise and >> consume SErrors. The RAS Extensions document (ARM DDI0587), >> '2.4.1 ESB and Unrecoverable errors' describes ESB as synchronising >> SError interrupts generated by 'instructions, translation table walks, >> hardware updates to the translation tables, and instruction fetches on >> the same PE'. This makes ESB equivalent to KVMs existing >> 'dsb, mrs-daifclr, isb' sequence. >> >> Use the alternatives to synchronise and consume any SError using ESB >> instead of unmasking and taking the SError. Set ARM_EXIT_WITH_SERROR_BIT >> in the exit_code so that we can restart the vcpu if it turns out this >> SError has no impact on the vcpu. >> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S >> index 12ee62d6d410..96caa5328b3a 100644 >> --- a/arch/arm64/kvm/hyp/entry.S >> +++ b/arch/arm64/kvm/hyp/entry.S >> @@ -124,6 +124,17 @@ ENTRY(__guest_exit) >> // Now restore the host regs >> restore_callee_saved_regs x2 >> >> +alternative_if ARM64_HAS_RAS_EXTN >> + // If we have the RAS extensions we can consume a pending error >> + // without an unmask-SError and isb. >> + esb >> + mrs_s x2, SYS_DISR_EL1 >> + str x2, [x1, #(VCPU_FAULT_DISR - VCPU_CONTEXT)] >> + cbz x2, 1f >> + msr_s SYS_DISR_EL1, xzr >> + orr x0, x0, #(1<<ARM_EXIT_WITH_SERROR_BIT) >> +1: ret >> +alternative_else >> // If we have a pending asynchronous abort, now is the >> // time to find out. From your VAXorcist book, page 666: >> // "Threaten me not, oh Evil one! For I speak with >> @@ -135,6 +146,8 @@ ENTRY(__guest_exit) >> >> dsb sy // Synchronize against in-flight ld/st >> msr daifclr, #4 // Unmask aborts >> + nop > > Oops. You've now introduced an instruction in what was supposed to be a > single instruction window (the isb). It means that we may fail to > identify the Serror as having been generated by our synchronisation > mechanism, and we'll panic for no good reason. Gah! and I pointed this thing out in the commit message, >> +alternative_endif >> >> // This is our single instruction exception window. A pending >> // SError is guaranteed to occur at the earliest when we unmask and here it is in the diff. > Moving the nop up will solve this. Yes. I've fixed this for v4, along with Julien's suggestions. Thanks! James _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm