Hi Borislav, On 17/05/18 14:36, Borislav Petkov wrote: > On Wed, May 16, 2018 at 03:51:14PM +0100, James Morse wrote: >> I thought this was safe because its just ghes_copy_tofrom_phys()s access to the >> fixmap slots that needs mutual exclusion. and here is where I was wrong: I was only looking at reading the data, we then dump it in struct ghes assuming it can only be notified on once CPU at a time. Oops. > For example: > ghes->estatus from above, before the NMI fired, has gotten some nice > scribbling over. AFAICT. Yup, thanks for the example! > Now, I don't know whether this can happen with the ARM facilities but if > they're NMI-like, I don't see why not. NOTIFY_SEA is synchronous so the error has to be something to do with the instruction that was interrupted. In your example this would mean the APEI code/data was corrupted, which there is little point trying to handle. NOTIFY_{SEI, SDEI} on the other hand are asynchronous, so this could happen. > Which means, that this code is not really reentrant and if should be > fixed to be callable from different contexts, then it should use private > buffers and be careful about locking. ... I need to go through this thing again to work out how the firmware-buffers map on to estatus=>ghes ... > Oh, and that > > if (in_nmi) > lock() > else > lock_irqsave() > > pattern is really yucky. And it is an explosion waiting to happen. The whole in_nmi()=>other-lock think looks like a hack to make a warning go away. We could get the notification to take whatever lock is appropriate further out, but it may mean some code duplication. (I'll put it on my list...) Thanks, James