On Sat, Feb 12, 2022 at 09:37:40PM -0500, Rik van Riel wrote: > Sometimes the page offlining code can leave behind a hwpoisoned clean > page cache page. This can lead to programs being killed over and over > and over again as they fault in the hwpoisoned page, get killed, and > then get re-spawned by whatever wanted to run them. Hi Rik, Do you know how that exactly happens? We should not be really leaving anything behind, and soft-offline (not hard) code works with the premise of only poisoning a page in case it was contained, so I am wondering what is going on here. In-use pagecache pages are migrated away, and the actual page is contained, and for clean ones, we already do the invalidate_inode_page() and then contain it in case we succeed. One scenario I can imagine this can happen is if by the time we call page_handle_poison(), someone has taken another refcount on the page, and the put_page() does not really free it, but I am not sure that can happen. -- Oscar Salvador SUSE Labs