On 7/22/24 11:15 AM, Zhijian Li (Fujitsu) wrote: > Hi David > > Thanks for you quickly reply. > > > On 22/07/2024 14:44, Vlastimil Babka (SUSE) wrote: >> On 7/22/24 4:10 AM, Li Zhijian wrote: >>> It's expected that no page should be left in pcp_list after calling >>> zone_pcp_disable() in offline_pages(). Previously, it's observed that >>> offline_pages() gets stuck [1] due to some pages remaining in pcp_list. >>> >>> Cause: >>> There is a race condition between drain_pages_zone() and __rmqueue_pcplist() >>> involving the pcp->count variable. See below scenario: >>> >>> CPU0 CPU1 >>> ---------------- --------------- >>> spin_lock(&pcp->lock); >>> __rmqueue_pcplist() { >>> zone_pcp_disable() { >>> /* list is empty */ >>> if (list_empty(list)) { >>> /* add pages to pcp_list */ >>> alloced = rmqueue_bulk() >>> mutex_lock(&pcp_batch_high_lock) >>> ... >>> __drain_all_pages() { >>> drain_pages_zone() { >>> /* read pcp->count, it's 0 here */ >>> count = READ_ONCE(pcp->count) >>> /* 0 means nothing to drain */ >>> /* update pcp->count */ >>> pcp->count += alloced << order; >>> ... >>> ... >>> spin_unlock(&pcp->lock); >>> >>> In this case, after calling zone_pcp_disable() though, there are still some >>> pages in pcp_list. And these pages in pcp_list are neither movable nor >>> isolated, offline_pages() gets stuck as a result. >>> >>> Solution: >>> Expand the scope of the pcp->lock to also protect pcp->count in >>> drain_pages_zone(), to ensure no pages are left in the pcp list after >>> zone_pcp_disable() >>> >>> [1] https://lore.kernel.org/linux-mm/6a07125f-e720-404c-b2f9-e55f3f166e85@xxxxxxxxxxx/ >>> >>> Cc: David Hildenbrand <david@xxxxxxxxxx> >>> Cc: Vlastimil Babka (SUSE) <vbabka@xxxxxxxxxx> >>> Reported-by: Yao Xingtao <yaoxt.fnst@xxxxxxxxxxx> >>> Signed-off-by: Li Zhijian <lizhijian@xxxxxxxxxxx> >> >> Can we find a breaking commit for Fixes: ? > > I haven't confirmed the FBC because my reproducer is not fit to run in the old kernel for some reasons. > but I noticed it didn't read the count without lock held since below commit > > 4b23a68f9536 mm/page_alloc: protect PCP lists with a spinlock > > > >> >>> --- >>> V2: >>> - Narrow down the scope of the spin_lock() to limit the draining latency. # Vlastimil and David >>> - In above scenario, it's sufficient to read pcp->count once with lock held, and it fully fixed >>> my issue[1] in thounds runs(It happened in more than 5% before). >> >> That should be ok indeed, but... >> >>> RFC: >>> https://lore.kernel.org/linux-mm/20240716073929.843277-1-lizhijian@xxxxxxxxxxx/ >>> --- >>> mm/page_alloc.c | 5 ++++- >>> 1 file changed, 4 insertions(+), 1 deletion(-) >>> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>> index 9ecf99190ea2..5388a35c4e9c 100644 >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -2323,8 +2323,11 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) >>> static void drain_pages_zone(unsigned int cpu, struct zone *zone) >>> { >>> struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); >>> - int count = READ_ONCE(pcp->count); >>> + int count; >>> >>> + spin_lock(&pcp->lock); >>> + count = pcp->count; >>> + spin_unlock(&pcp->lock); >>> while (count) { >>> int to_drain = min(count, pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX); >>> count -= to_drain; >> >> It's wasteful to do a lock/unlock cycle just to read the count. > > How about, > > static void drain_pages_zone(unsigned int cpu, struct zone *zone) > { > struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); > int count, to_drain; > > do { > spin_lock(&pcp->lock); > to_drain = min(pcp->count, pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX); > free_pcppages_bulk(zone, to_drain, pcp, 0); > spin_unlock(&pcp->lock); > } while (to_drain); Yeah better than break. But I think you still should use count = pcp->count; ... count -= to_drain; } while(count); or you make one extra wasteful iteration to find to_drain is zero. (assuming "it's sufficient to read pcp->count once with lock held", that I agree with) > } >> It could >> rather look something like this: >> > > Sorry, I don't follow your code... > >> while (true) >> spin_lock(&pcp->lock); >> count = pcp->count; >> ... >> count -= to_drain; >> if (to_drain) >> drain_zone_pages(...) > > Which subroutine does this code belong to, why it involves drain_zone_pages Yeah sorry I meant free_pcppages_bulk() >> ... >> spin_unlock(&pcp->lock); >> if (count) >> break; > > Thanks > Zhijian