On Wed, 25 Dec 2024, mengensun88@xxxxxxxxx wrote: > From: MengEn Sun <mengensun@xxxxxxxxxxx> > > Since version v5.19-rc7, draining remote per-CPU pools (PCP) no > longer relies on workqueues; instead, the current CPU is > responsible for draining the PCPs of all CPUs. > > However, due to the lack of scheduling points in the > __drain_all_pages function, this can lead to soft locks in > some extreme cases. > > We observed the following soft-lockup stack on a 64-core, > 223GB machine during testing: > watchdog: BUG: soft lockup - CPU#29 stuck for 23s! [stress-ng-vm] > RIP: 0010:native_queued_spin_lock_slowpath+0x5b/0x1c0 > _raw_spin_lock > drain_pages_zone > drain_pages > drain_all_pages > __alloc_pages_slowpath > __alloc_pages_nodemask > alloc_pages_vma > do_huge_pmd_anonymous_page > handle_mm_fault > > Fixes: <443c2accd1b66> ("mm/page_alloc: remotely drain per-cpu lists") The < > would be removed. > Reviewed-by: JinLiang Zheng <alexjlzheng@xxxxxxxxxxx> > Signed-off-by: MengEn Sun <mengensun@xxxxxxxxxxx> > --- > mm/page_alloc.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index c6c7bb3ea71b..d05b32ec1e40 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2487,6 +2487,7 @@ static void __drain_all_pages(struct zone *zone, bool force_all_cpus) > drain_pages_zone(cpu, zone); > else > drain_pages(cpu); > + cond_resched(); > } > > mutex_unlock(&pcpu_drain_mutex); This is another example of a soft lockup that we haven't observed and we have systems with many more cores than 64. Is this happening because of contention on pcp->lock or zone->lock? I would assume the latter, but best to confirm. I think this is just papering over a scalability problem with zone->lock. How many NUMA nodes and zones does this 223GB system have? If this is a problem with zone->lock, this problem should likely be addressed more holistically.