On Tue, Jan 07, 2025 at 11:31:05AM +0100, David Hildenbrand wrote: > On 07.01.25 10:43, Daniel Gomez wrote: > > Hi, > > Hi, > > > > > High-capacity SSDs require writes to be aligned with the drive's > > indirection unit (IU), which is typically >4 KiB, to avoid RMW. To > > support swap on these devices, we need to ensure that writes do not > > cross IU boundaries. So, I think this may require increasing the minimum > > allocation size for swap users. > > How would we handle swapout/swapin when we have smaller pages (just imagine > someone does a mmap(4KiB))? Swapout would require to be aligned to the IU. An mmap of 4 KiB would have to perform an IU KiB write, e.g. 16 KiB or 32 KiB, to avoid any potential RMW penalty. So, I think aligning the mmap allocation to the IU would guarantee a write of the required granularity and alignment. But let's also look at your suggestion below with swapcache. Swapin can still be performed at LBA format levels (e.g. 4 KiB) without the same write penalty implications, and only affecting performance if I/Os are not conformant to these boundaries. So, reading at IU boundaries is preferred to get optimal performance, not a 'requirement'. > > Could this be something that gets abstracted/handled by the swap > implementation? (i.e., multiple small folios get added to the swapcache but > get written out / read in as a single unit?). Do you mean merging like in the block layer? I'm not entirely sure if this could guarantee deterministically the I/O boundaries the same way it does min order large folio allocations in the page cache. But I guess is worth exploring as optimization. > > I recall that we have been talking about a better swap abstraction for years > :) Adding Chris Li to the cc list in case he has more input. > > Might be a good topic for LSF/MM (might or might not be a better place than > the MM alignment session). Both options work for me. LSF/MM is in 12 weeks so, having a previous session would be great. Daniel > > -- > Cheers, > > David / dhildenb >