Re: [PATCH 5/6] mm/page_alloc: Protect PCP lists with a spinlock

Nicolas Saenz Julienne <nsaenzju@xxxxxxxxxx> · Fri, 13 May 2022 14:22:28 +0200



On Thu, 2022-05-12 at 09:50 +0100, Mel Gorman wrote:
> Currently the PCP lists are protected by using local_lock_irqsave to
> prevent migration and IRQ reentrancy but this is inconvenient. Remote
> draining of the lists is impossible and a workqueue is required and
> every task allocation/free must disable then enable interrupts which
> is
> expensive.
> 
> As preparation for dealing with both of those problems, protect the
> lists with a spinlock. The IRQ-unsafe version of the lock is used
> because IRQs are already disabled by local_lock_irqsave. spin_trylock
> is used in preparation for a time when local_lock could be used
> instead
> of lock_lock_irqsave.
> 
> The per_cpu_pages still fits within the same number of cache lines
> after
> this patch relative to before the series.
> 
> struct per_cpu_pages {
>         spinlock_t                 lock;                 /*     0    
> 4 */
>         int                        count;                /*     4    
> 4 */
>         int                        high;                 /*     8    
> 4 */
>         int                        batch;                /*    12    
> 4 */
>         short int                  free_factor;          /*    16    
> 2 */
>         short int                  expire;               /*    18    
> 2 */
> 
>         /* XXX 4 bytes hole, try to pack */
> 
>         struct list_head           lists[13];            /*    24  
> 208 */
> 
>         /* size: 256, cachelines: 4, members: 7 */
>         /* sum members: 228, holes: 1, sum holes: 4 */
>         /* padding: 24 */
> } __attribute__((__aligned__(64)));
> 
> There is overhead in the fast path due to acquiring the spinlock even
> though the spinlock is per-cpu and uncontended in the common case.
> Page
> Fault Test (PFT) running on a 1-socket reported the following results
> on
> a 1 socket machine.
> 
>                                      5.18.0-rc1               5.18.0-
> rc1
>                                         vanilla         mm-pcpdrain-
> v2r1
> Hmean     faults/sec-1   886331.5718 (   0.00%)   885462.7479 (  -
> 0.10%)
> Hmean     faults/sec-3  2337706.1583 (   0.00%)  2332130.4909 *  -
> 0.24%*
> Hmean     faults/sec-5  2851594.2897 (   0.00%)  2844123.9307 (  -
> 0.26%)
> Hmean     faults/sec-7  3543251.5507 (   0.00%)  3516889.0442 *  -
> 0.74%*
> Hmean     faults/sec-8  3947098.0024 (   0.00%)  3916162.8476 *  -
> 0.78%*
> Stddev    faults/sec-1     2302.9105 (   0.00%)     2065.0845 ( 
> 10.33%)
> Stddev    faults/sec-3     7275.2442 (   0.00%)     6033.2620 ( 
> 17.07%)
> Stddev    faults/sec-5    24726.0328 (   0.00%)    12525.1026 ( 
> 49.34%)
> Stddev    faults/sec-7     9974.2542 (   0.00%)     9543.9627 (  
> 4.31%)
> Stddev    faults/sec-8     9468.0191 (   0.00%)     7958.2607 ( 
> 15.95%)
> CoeffVar  faults/sec-1        0.2598 (   0.00%)        0.2332 ( 
> 10.24%)
> CoeffVar  faults/sec-3        0.3112 (   0.00%)        0.2587 ( 
> 16.87%)
> CoeffVar  faults/sec-5        0.8670 (   0.00%)        0.4404 ( 
> 49.21%)
> CoeffVar  faults/sec-7        0.2815 (   0.00%)        0.2714 (  
> 3.60%)
> CoeffVar  faults/sec-8        0.2399 (   0.00%)        0.2032 ( 
> 15.28%)
> 
> There is a small hit in the number of faults per second but given
> that
> the results are more stable, it's borderline noise.
> 
> Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
> Tested-by: Minchan Kim <minchan@xxxxxxxxxx>
> Acked-by: Minchan Kim <minchan@xxxxxxxxxx>
> ---

Reviewed-by: Nicolas Saenz Julienne <nsaenzju@xxxxxxxxxx>

Thanks,

-- 
Nicolás Sáenz