Hi Vlastimil, sorry for the late reply and thanks for your feedback. :) On Tue, 2021-11-23 at 15:58 +0100, Vlastimil Babka wrote: > > [1] Other approaches can be found here: > > > > - Static branch conditional on nohz_full, no performance loss, the extra > > config option makes is painful to maintain (v1): > > https://lore.kernel.org/linux-mm/20210921161323.607817-5-nsaenzju@xxxxxxxxxx/ > > > > - RCU based approach, complex, yet a bit less taxing performance wise > > (RFC): > > https://lore.kernel.org/linux-mm/20211008161922.942459-4-nsaenzju@xxxxxxxxxx/ > > Hm I wonder if there might still be another alternative possible. IIRC I did > propose at some point a local drain on the NOHZ cpu before returning to > userspace, and then avoiding that cpu in remote drains, but tglx didn't like > the idea of making entering the NOHZ full mode more expensive [1]. > > But what if we instead set pcp->high = 0 for these cpus so they would avoid > populating the pcplists in the first place? Then there wouldn't have to be a > drain at all. On the other hand page allocator operations would not benefit > from zone lock batching on those cpus. But perhaps that would be acceptable > tradeoff, as a nohz cpu is expected to run in userspace most of the time, > and page allocator operations are rare except maybe some initial page > faults? (I assume those kind of workloads pre-populate and/or mlock their > address space anyway). I've looked a bit into this and it seems straightforward. Our workloads pre-populate everything, and a slight statup performance hit is not that tragic (I'll measure it nonetheless). The per-cpu nohz_full state at some point will be dynamic, but the feature seems simple to disable/enable. I'll have to teach __drain_all_pages(zone, force_all_cpus=true) to bypass this special case but that's all. I might have a go at this. Thanks! -- Nicolás Sáenz