On Wed, Jan 8, 2025 at 1:24 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 08.01.25 22:19, Chris Li wrote: > > On Wed, Jan 8, 2025 at 12:36 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > >> > >>>> Maybe the swapcache could somehow abstract that? We currently have the swap > >>>> slot allocator, that assigns slots to pages. > >>>> > >>>> Assuming we have a 16 KiB BS but a 4 KiB page, we might have various options > >>>> to explore. > >>>> > >>>> For example, we could size swap slots 16 KiB, and assign even 4 KiB pages a > >>>> single slot. This would waste swap space with small folios, that would go > >>>> away with large folios. > >>> > >>> So batching order-0 folios in bigger slots that match the FS BS (e.g. 16 > >>> KiB) to perform disk writes, right? > >> > >> Batching might be one idea, but the first idea I raised here would be > >> that the swap slot size will match the BS (e.g., 16 KiB) and contain at > >> most one folio. > >> > >> So a order-0 folio would get a single slot assigned and effectively > >> "waste" 12 KiB of disk space. > > > > I prefer not to "waste" that. It will be wasted on the write > > amplification as well. > > If it can be implemented fairly easily, sure! :) > > Looking forward to hearing about the proposal! Hi David, Sorry I have been pretty busy with other work related stuff recently. I did not have a chance to do the write up yet. I might not be able to make the next Wednesday upstream alignment meeting for this topic. Adding Kairui to the CC list, I have been collerating with him on the swap related changes. I do see it is beneficial to separate out the swap cache part of the swap entries (virtual) and block layer write locations (physical). So the current swap allocator allocates the virtual swap entry and still keeps the property of swap entry contiguous within a folio. The virtual swap entry also owns the current swap count and swap cache reclaim. Have a lookup array to translate the virtual entry to the physical location. The physical location also needs an allocator, but much simpler. The physical location allocation does not participate in swap cache reclaim, those happen in the virtual entry. Nor does it have the swap count, only 1 bit of information used or not. The physical entry allocation does not need to be contiguous within the folio either. This redirection layer will provide the flexibility to do more. e.g. bridge the gap between the block size between virtual entry and physical entry. It can provide the IO batching layer to merge more than one virtual swap entry into a larger physical writing block. Similarly it can allow swap to write out compressed zswap/zram into the SSD, using similar IO batching. The memory overhead is 4 byte per swap entry for the lookup table. Maybe 1 bit per physical entry for that location is used or not. That is the key part of the idea. There are other ideas like dynamic growing the vmalloc array pages can be viewed as incremental local improvement, it does not change the core data structure of swap much. Chris