Dear James, Thanks for this mail and sorry for my late response. 2018-02-16 1:55 GMT+08:00 James Morse <james.morse@xxxxxxx>: > Hi gengdongjiu, liu jun > > On 05/02/18 11:24, gengdongjiu wrote: [....] >> >>> Is the emulated SError routed following the routing rules for HCR_EL2.{AMO, >>> TGE}? >> >> Yes, it is. > > ... and yet ... > > >>> What does your firmware do when it wants to emulate SError but its masked? >>> (e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had >>> PSTATE.A set. >>> e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the >>> emulated SError should go to EL1. This effectively masks SError.) >> >> Currently we does not consider much about the mask status(SPSR). > > .. this is a problem. > > If you ignore SPSR_EL3 you may deliver an SError to EL1 when the exception > interrupted EL2. Even if you setup the EL1 register correctly, EL1 can't eret to > EL2. This should never happen, SError is effectively masked if you are running > at an EL higher than the one its routed to. > > More obviously: if the exception came from the EL that SError should be routed > to, but PSTATE.A was set, you can't deliver SError. Masking SError is the only James, I summarized the masking and routing rules for SError to confirm with you for the firmware first solution, 1. If the HCR_EL2.{AMO,TGE} is set, which means the SError should route to EL2, When system happens SError and trap to EL3, If EL3 find HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both set, and find this SError come from EL2, it will not deliver an SError: store the RAS error in the BERT and 'reboot'; but if it find that this SError come from EL1 or EL0, it also need to deliver an SError, right? 2. If the HCR_EL2.{AMO,TGE} is not set, which means the SError should route to EL1, When system happens SError and trap to EL3, If EL3 find HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both not set, and find this SError come from EL1, it will not deliver an SError: store the RAS error in the BERT and 'reboot'; but if it find that this SError come from EL0, it also need to deliver an SError, right? > way the OS has to indicate it can't take an exception right now. VBAR_EL1 may be > 'wrong' if we're doing some power-management, the FAR/ELR/ESR/SPSR registers may > contain live values that the OS would lose if you deliver another exception over > the top. > > If you deliver an emulated-SError as the OS eret's, your new ELR will point at > the eret instruction and the CPU will spin on this instruction forever. > > You have to honour the masking and routing rules for SError, otherwise no OS can > run safely with this firmware. > > >> I remember that you ever suggested firmware should reboot if the mask status >> is set(SPSR), right? > > Yes, this is my suggestion of what to do if you can't deliver an SError: store > the RAS error in the BERT and 'reboot'. > > >> I CC "liu jun" <liujun88@xxxxxxxxxxxxx> who is our UEFI firmware Architect, >> if you have firmware requirements, you can raise again. > > (UEFI? I didn't think there was any of that at EL3, but I'm not familiar with > all the 'PI' bits). > > The requirement is your emulated-SError from EL3 looks exactly like a > physical-SError as if EL3 wasn't implemented. > Your CPU has to handle cases where it can't deliver an SError, your emulation > has to do the same. > > This is not something any OS can work around. > > >>> Answers to these let us determine whether a bug is in the firmware or the >>> kernel. If firmware is expecting the OS to do something special, I'd like to know >>> about it from the beginning! >> >> I know your meaning, thanks for raising it again. > > > Happy new year, > > James > _______________________________________________ > kvmarm mailing list > kvmarm@xxxxxxxxxxxxxxxxxxxxx > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html