>> But maybe it is some RMW instruction ... then, if all the above options didn't happen ... we >> could get another machine check from the same address. But then we just follow the usual >> recovery path. > Let assume the instruction that cause the COW is in the 63/64 case, aka, > it is writing a different cache line from the poisoned one. But the new_page > allocated in COW is dropped right? So might page fault again? It can, but this should be no surprise to a user that has a signal handler for a h/w event (SIGBUS, SIGSEGV, SIGILL) that does nothing to address the problem, but simply returns to re-execute the same instruction that caused the original trap. There may be badly written signal handlers that do this. But they just cause pain for themselves. Linux can keep taking the traps and fixing things up and sending a new signal over and over. In this case that loop may involve taking the machine check again, so some extra pain for the kernel, but recoverable machine checks on Intel/x86 switched from broadcast to delivery to just the logical CPU that tried to consume the poison a few generations back. So only a bit more painful than a repeated page fault. -Tony