On 2024/4/10 0:10, Oscar Salvador wrote: > On Tue, Apr 09, 2024 at 04:10:22PM +0200, Oscar Salvador wrote: >> On Sun, Apr 07, 2024 at 04:54:56PM +0800, Miaohe Lin wrote: >>> In short, below scene breaks the lock dependency chain: >>> >>> memory_failure >>> __page_handle_poison >>> zone_pcp_disable -- lock(pcp_batch_high_lock) >>> dissolve_free_huge_page >>> __hugetlb_vmemmap_restore_folio >>> static_key_slow_dec >>> cpus_read_lock -- rlock(cpu_hotplug_lock) >>> >>> Fix this by calling drain_all_pages() instead. >>> >>> Signed-off-by: Miaohe Lin <linmiaohe@xxxxxxxxxx> >> >> Acked-by: Oscar Salvador <osalvador@xxxxxxx> Thanks. > > On a second though, > > disabling pcp via zone_pcp_disable() was a deterministic approach. > Now, with drain_all_pages() we drain PCP queues to buddy, but nothing > guarantees that those pages do not end up in a PCP queue again before we > the call to take_page_off_budy() if we > need refilling, right? AFAICS, iff check_pages_enabled static key is enabled and in hard offline mode, check_new_pages() will prevent those pages from ending up in a PCP queue again when refilling PCP list. Because PageHWPoison pages will be taken as 'bad' pages and skipped when refill PCP list. > > I guess we can live with that because we will let the system know that we > failed to isolate that page. We're trying best to isolate that page anyway. :) Thanks for your thought. . > >