在 2022/10/21 PM12:41, Luck, Tony 写道: >>> When we do return to user mode the task is going to be busy servicing >>> a SIGBUS ... so shouldn't try to touch the poison page before the >>> memory_failure() called by the worker thread cleans things up. >> >> What about an RT process on a busy system? >> The worker threads are pretty low priority. > > Most tasks don't have a SIGBUS handler ... so they just die without possibility of accessing poison > > If this task DOES have a SIGBUS handler, and that for some bizarre reason just does a "return" > so the task jumps back to the instruction that cause the COW then there is a 63/64 > likelihood that it is touching a different cache line from the poisoned one. > > In the 1/64 case ... its probably a simple store (since there was a COW, we know it was trying to > modify the page) ... so won't generate another machine check (those only happen for reads). > > But maybe it is some RMW instruction ... then, if all the above options didn't happen ... we > could get another machine check from the same address. But then we just follow the usual > recovery path. > > -Tony Let assume the instruction that cause the COW is in the 63/64 case, aka, it is writing a different cache line from the poisoned one. But the new_page allocated in COW is dropped right? So might page fault again? Best Regards, Shuai