Re: [PATCH] mm, slub: splice cpu and page freelists in deactivate_slab()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 15, 2021 at 7:35 PM Vlastimil Babka <vbabka@xxxxxxx> wrote:
> In deactivate_slab() we currently move all but one objects on the cpu freelist
> to the page freelist one by one using the costly cmpxchg_double() operation.
> Then we unfreeze the page while moving the last object on page freelist, with
> a final cmpxchg_double().
>
> This can be optimized to avoid the cmpxchg_double() per object. Just count the
> objects on cpu freelist (to adjust page->inuse properly) and also remember the
> last object in the chain. Then splice page->freelist to the last object and
> effectively add the whole cpu freelist to page->freelist while unfreezing the
> page, with a single cmpxchg_double().

This might have some more (good) effects, although these might well be
too minuscule to notice:

 - The old code inverted the direction of the freelist, while the new
code preserves the direction.
 - We're no longer dirtying the cachelines of objects in the middle of
the freelist.

In the current code it probably doesn't really matter, since I think
we basically only take this path when handling NUMA mismatches,
PFMEMALLOC stuff, racing new_slab(), and flush_slab() for handling
flushing IPIs? But yeah, if we want to start automatically sending
flush IPIs, it might be a good idea, given that the next accesses to
the page will probably come from a different CPU (unless the page is
entirely unused, in which case it may be freed to the page allocator's
percpu list) and we don't want to create unnecessary cache/memory
traffic. (And it's a good cleanup regardless, I think.)

> Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx>

Reviewed-by: Jann Horn <jannh@xxxxxxxxxx>

[...]
>         /*
> -        * Stage two: Ensure that the page is unfrozen while the
> -        * list presence reflects the actual number of objects
> -        * during unfreeze.
> +        * Stage two: Unfreeze the page while splicing the per-cpu
> +        * freelist to the head of page's freelist.
> +        *
> +        * Ensure that the page is unfrozen while the list presence
> +        * reflects the actual number of objects during unfreeze.

(my computer complains about trailing whitespace here)

>          *
>          * We setup the list membership and then perform a cmpxchg
>          * with the count. If there is a mismatch then the page




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux