On Fri, May 28, 2021 at 11:08:01AM +0200, David Hildenbrand wrote: > On 28.05.21 11:03, David Hildenbrand wrote: > > On 28.05.21 10:55, Mel Gorman wrote: > > > On Thu, May 27, 2021 at 12:36:21PM -0700, Dave Hansen wrote: > > > > Hi Mel, > > > > > > > > Feng Tang tossed these on a "Cascade Lake" system with 96 threads and > > > > ~512G of persistent memory and 128G of DRAM. The PMEM is in "volatile > > > > use" mode and being managed via the buddy just like the normal RAM. > > > > > > > > The PMEM zones are big ones: > > > > > > > > present 65011712 = 248 G > > > > high 134595 = 525 M > > > > > > > > The PMEM nodes, of course, don't have any CPUs in them. > > > > > > > > With your series, the pcp->high value per-cpu is 69584 pages or about > > > > 270MB per CPU. Scaled up by the 96 CPU threads, that's ~26GB of > > > > worst-case memory in the pcps per zone, or roughly 10% of the size of > > > > the zone. > > > > When I read about having such big amounts of free memory theoretically > > stuck in PCP lists, I guess we really want to start draining the PCP in > > alloc_contig_range(), just as we do with memory hotunplug when offlining. > > > > Correction: we already drain the pcp, we just don't temporarily disable it, > so a race as described in offline_pages() could apply: > > "Disable pcplists so that page isolation cannot race with freeing > in a way that pages from isolated pageblock are left on pcplists." > > Guess we'd then want to move the draining before start_isolate_page_range() > in alloc_contig_range(). > Or instead of draining, validate the PFN range in alloc_contig_range is within the same zone and if so, call zone_pcp_disable() before start_isolate_page_range and enable after __alloc_contig_migrate_range. -- Mel Gorman SUSE Labs