Re: Swap Min Odrer

Daniel Gomez <da.gomez@xxxxxxxxxxx> · Tue, 7 Jan 2025 13:29:31 +0100

On Tue, Jan 07, 2025 at 11:31:05AM +0100, David Hildenbrand wrote:
> On 07.01.25 10:43, Daniel Gomez wrote:
> > Hi,
> 
> Hi,
> 
> > 
> > High-capacity SSDs require writes to be aligned with the drive's
> > indirection unit (IU), which is typically >4 KiB, to avoid RMW. To
> > support swap on these devices, we need to ensure that writes do not
> > cross IU boundaries. So, I think this may require increasing the minimum
> > allocation size for swap users.
> 
> How would we handle swapout/swapin when we have smaller pages (just imagine
> someone does a mmap(4KiB))?

Swapout would require to be aligned to the IU. An mmap of 4 KiB would
have to perform an IU KiB write, e.g. 16 KiB or 32 KiB, to avoid any
potential RMW penalty. So, I think aligning the mmap allocation to the
IU would guarantee a write of the required granularity and alignment.
But let's also look at your suggestion below with swapcache.

Swapin can still be performed at LBA format levels (e.g. 4 KiB) without
the same write penalty implications, and only affecting performance
if I/Os are not conformant to these boundaries. So, reading at IU
boundaries is preferred to get optimal performance, not a 'requirement'.

> 
> Could this be something that gets abstracted/handled by the swap
> implementation? (i.e., multiple small folios get added to the swapcache but
> get written out / read in as a single unit?).

Do you mean merging like in the block layer? I'm not entirely sure if
this could guarantee deterministically the I/O boundaries the same way
it does min order large folio allocations in the page cache. But I guess
is worth exploring as optimization.

> 
> I recall that we have been talking about a better swap abstraction for years
> :)

Adding Chris Li to the cc list in case he has more input.

> 
> Might be a good topic for LSF/MM (might or might not be a better place than
> the MM alignment session).

Both options work for me. LSF/MM is in 12 weeks so, having a previous
session would be great.

Daniel

> 
> -- 
> Cheers,
> 
> David / dhildenb
>