在 2022/10/22 AM12:30, Luck, Tony 写道: >>> But maybe it is some RMW instruction ... then, if all the above options didn't happen ... we >>> could get another machine check from the same address. But then we just follow the usual >>> recovery path. > > >> Let assume the instruction that cause the COW is in the 63/64 case, aka, >> it is writing a different cache line from the poisoned one. But the new_page >> allocated in COW is dropped right? So might page fault again? > > It can, but this should be no surprise to a user that has a signal handler for > a h/w event (SIGBUS, SIGSEGV, SIGILL) that does nothing to address the > problem, but simply returns to re-execute the same instruction that caused > the original trap. > > There may be badly written signal handlers that do this. But they just cause > pain for themselves. Linux can keep taking the traps and fixing things up and > sending a new signal over and over. > > In this case that loop may involve taking the machine check again, so some > extra pain for the kernel, but recoverable machine checks on Intel/x86 switched > from broadcast to delivery to just the logical CPU that tried to consume the poison > a few generations back. So only a bit more painful than a repeated page fault. > > -Tony > > I see, thanks for your patient explanation :) Best Regards, Shuai