> What I'm still unclear on, does this new version address that > "mysterious" hang or panic which the validation team triggered or you > haven't checked yet? No :-( They are triggering some case where multiple threads in a process hit the same poison, and somehow memory_failure() fails to complete offlining the page. At this point any other threads that hit that page get the early return from memory_failure (because the page flags say it is poisoned) ... and so we loop. But the "recover from cases where multiple machine checks happen simultaneously" case is orthogonal to the "do the right thing to recover when the kernel touches poison at a user address". So I think we can tackle them separately -Tony