On 4/14/21 3:39 PM, Mel Gorman wrote: > struct per_cpu_pages is protected by the pagesets lock but it can be > embedded within struct per_cpu_pages at a minor cost. This is possible > because per-cpu lookups are based on offsets. Paraphrasing an explanation > from Peter Ziljstra > > The whole thing relies on: > > &per_cpu_ptr(msblk->stream, cpu)->lock == per_cpu_ptr(&msblk->stream->lock, cpu) > > Which is true because the lhs: > > (local_lock_t *)((zone->per_cpu_pages + per_cpu_offset(cpu)) + offsetof(struct per_cpu_pages, lock)) > > and the rhs: > > (local_lock_t *)((zone->per_cpu_pages + offsetof(struct per_cpu_pages, lock)) + per_cpu_offset(cpu)) > > are identical, because addition is associative. > > More details are included in mmzone.h. This embedding is not completely > free for three reasons. > > 1. As local_lock does not return a per-cpu structure, the PCP has to > be looked up twice -- first to acquire the lock and again to get the > PCP pointer. > > 2. For PREEMPT_RT and CONFIG_DEBUG_LOCK_ALLOC, local_lock is potentially > a spinlock or has lock-specific tracking. In both cases, it becomes > necessary to release/acquire different locks when freeing a list of > pages in free_unref_page_list. Looks like this pattern could benefit from a local_lock API helper that would do the right thing? It probably couldn't optimize much the CONFIG_PREEMPT_RT case which would need to be unlock/lock in any case, but CONFIG_DEBUG_LOCK_ALLOC could perhaps just keep the IRQ's disabled and just note the change of what's acquired? > 3. For most kernel configurations, local_lock_t is empty and no storage is > required. By embedding the lock, the memory consumption on PREEMPT_RT > and CONFIG_DEBUG_LOCK_ALLOC is higher. But I wonder, is there really a benefit to this increased complexity? Before the patch we had "pagesets" - a local_lock that protects all zones' pcplists. Now each zone's pcplists have own local_lock. On !PREEMPT_RT we will never take the locks of multiple zones from the same CPU in parallel, because we use local_lock_irqsave(). Can that parallelism happen on PREEMPT_RT, because that could perhaps justify the change? > Suggested-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> > ---