Re: [RFC PATCH 10/10] mm/swap: optimize synchronous swapin

"Huang, Ying" <ying.huang@xxxxxxxxx> · Wed, 27 Mar 2024 14:22:36 +0800

Kairui Song <ryncsn@xxxxxxxxx> writes:

> From: Kairui Song <kasong@xxxxxxxxxxx>
>
> Interestingly the major performance overhead of synchronous is actually
> from the workingset nodes update, that's because synchronous swap in

If it's the major overhead, why not make it the first optimization?

> keeps adding single folios into a xa_node, making the node no longer
> a shadow node and have to be removed from shadow_nodes, then remove
> the folio very shortly and making the node a shadow node again,
> so it has to add back to the shadow_nodes.

The folio is removed only if should_try_to_free_swap() returns true?

> Mark synchronous swapin folio with a special bit in swap entry embedded
> in folio->swap, as we still have some usable bits there. Skip workingset
> node update on insertion of such folio because it will be removed very
> quickly, and will trigger the update ensuring the workingset info is
> eventual consensus.

Is this safe?  Is it possible for the shadow node to be reclaimed after
the folio are added into node and before being removed?

If so, we may consider some other methods.  Make shadow_nodes per-cpu?

> Test result of sequential swapin/out of 30G zero page on ZRAM:
>
>                Before (us)        After (us)
> Swapout:       33853883           33886008
> Swapin:        38336519           32465441 (+15.4%)
> Swapout (THP): 6814619            6899938
> Swapin (THP) : 38383367           33193479 (+13.6%)
>

[snip]

--
Best Regards,
Huang, Ying