On 2024/4/10 16:52, Oscar Salvador wrote: > On Wed, Apr 10, 2024 at 03:52:14PM +0800, Miaohe Lin wrote: >> AFAICS, iff check_pages_enabled static key is enabled and in hard offline mode, >> check_new_pages() will prevent those pages from ending up in a PCP queue again >> when refilling PCP list. Because PageHWPoison pages will be taken as 'bad' pages >> and skipped when refill PCP list. > > Yes, but check_pages_enabled static key is only enabled when > either CONFIG_DEBUG_PAGEALLOC or CONFIG_DEBUG_VM are set, which means > that under most of the systems that protection will not take place. > > Which takes me to a problem we had in the past where we were handing > over hwpoisoned pages from PCP lists happily. > Now, with for soft-offline mode, we worked hard to stop doing that > because soft-offline is a non-disruptive operation and no one should get > killed. > hard-offline is another story, but still I think that extending the > comment to include the following would be a good idea: > > "Disabling pcp before dissolving the page was a deterministic approach > because we made sure that those pages cannot end up in any PCP list. > Draining PCP lists expels those pages to the buddy system, but nothing > guarantees that those pages do not get back to a PCP queue if we need > to refill those." This really helps. Will add it in v2. Thanks Oscar. > > Just to remind ourselves of the dangers of a non-deterministic > approach. > > > Thanks > >