On 5/28/2024 1:30 PM, Luck, Tony wrote:
+ if (unlikely(folio_mc_copy(dst, src))) {
+ folio_ref_unfreeze(src, expected_count);
+ return -EFAULT;
It doesn't look like any code takes action to avoid re-using the poisoned page.
So you survived, hurrah! But left the problem page for some other code to trip over.
Tony, did you mean that memory_failure_queue() should be called? If
not, could you elaborate more?
Maybe memory_failure_queue() can help here. Though it would need to know
which pfn inside the folio hit the poison. So some more infrastructure around the
copy to make sure the pfn is saved.
It looks like memory_failure_queue() is only invoked when source user
page has a poison, not if the poison
is in source kernel page. Any idea why?
thanks!
-jane
-Tony