[...] > >> + case ESR_ELx_AET_UER: /* The error has not been propagated */ >> + /* >> + * Userspace only handle the guest SError Interrupt(SEI) if the >> + * error has not been propagated >> + */ >> + run->exit_reason = KVM_EXIT_EXCEPTION; >> + run->ex.exception = ESR_ELx_EC_SERROR; >> + run->ex.error_code = KVM_SEI_SEV_RECOVERABLE; >> + return 0; > > We should not pass RAS notifications to user space. The kernel either handles > them, or it panics(). User space shouldn't even know if the kernel supports RAS For the ESR_ELx_AET_UER(Recoverable error), let us see its definition below, which get from [0] The state of the PE is Recoverable if all of the following are true: — The error has not been silently propagated. — The error has not been architecturally consumed by the PE. (The PE architectural state is not infected.) — The exception is precise and PE can recover execution from the preferred return address of the exception, if software locates and repairs the error. The PE cannot make correct progress without either consuming the error or otherwise making the error unrecoverable. The error remains latent in the system. If software cannot locate and repair the error, either the application or the VM, or both, must be isolated by software. so we can see the exception is precise and PE can recover execution from the preferred return address of the exception, so let guest handling it is better, for example, if it is guest application RAS error, we can kill the guest application instead of panic whole OS; if it is guest kernel RAS error, guest will panic. Host does not know which application of guest has error, so host can not handle it, panic OS is not a good choice for the Recoverable error. [0] https://static.docs.arm.com/ddi0587/a/RAS%20Extension-release%20candidate_march_29.pdf > until it gets an MCEERR signal. user space will detect whether kernel support RAS before handing it. > > You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS. > > If we get a RAS SError and there are no CPER records or values in the ERR nodes, > we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors) > > >> + default: >> + /* >> + * Until now, the CPU supports RAS and SEI is fatal, or host >> + * does not support to handle the SError. >> + */ >> + panic("This Asynchronous SError interrupt is dangerous, panic"); >> + } >> + >> + return 0; >> +} >> + >> /* >> * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on >> * proper exit to userspace. > > > > James > _______________________________________________ > kvmarm mailing list > kvmarm@xxxxxxxxxxxxxxxxxxxxx > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm