[PATCH 2/2] mm/page_alloc: Add remote draining support to per-cpu lists

Nicolas Saenz Julienne <nsaenzju@xxxxxxxxxx> · Tue, 8 Feb 2022 11:07:50 +0100

The page allocator's per-cpu page lists (pcplists) are currently
protected using local_locks. While performance savvy, this doesn't allow
for remote access to these structures. CPUs requiring system-wide
changes to the per-cpu lists get around this by scheduling
workers on each CPU. That said, some setups like NOHZ_FULL CPUs,
aren't well suited to this since they can't handle interruptions
of any sort.

To mitigate this, replace the current draining mechanism with one that
allows remotely draining the lists:

 - Each CPU now has two pcplists pointers: one that points to a pcplists
   instance that is in-use, 'pcp->lp', another that points to an idle
   and empty instance, 'pcp->drain'. CPUs access their local pcplists
   through 'pcp->lp' and the pointer is dereferenced atomically.

 - When a CPU decides it needs to empty some remote pcplists, it'll
   atomically exchange the remote CPU's 'pcp->lp' and 'pcp->drain'
   pointers. A remote CPU racing with this will either have:

     - An old 'pcp->lp' reference, it'll soon be emptied by the drain
       process, we just have to wait for it to finish using it.

     - The new 'pcp->lp' reference, that is, an empty pcplists instance.
       rcu_replace_pointer()'s release semantics ensures any prior
       changes will be visible by the remote CPU, for example: changes
       to 'pcp->high' and 'pcp->batch' when disabling the pcplists.

 - The CPU that started the drain can now wait for an RCU grace period
   to make sure the remote CPU is done using the old pcplists.
   synchronize_rcu() counts as a full memory barrier, so any changes the
   local CPU makes to the soon to be drained pcplists will be visible to
   the draining CPU once it returns.

 - Then the CPU can safely free the old pcplists. Nobody else holds a
   reference to it. Note that concurrent access to the remote pcplists
   drain is protected by the 'pcpu_drain_mutex'.