Maybe the swapcache could somehow abstract that? We currently have the swap
slot allocator, that assigns slots to pages.
Assuming we have a 16 KiB BS but a 4 KiB page, we might have various options
to explore.
For example, we could size swap slots 16 KiB, and assign even 4 KiB pages a
single slot. This would waste swap space with small folios, that would go
away with large folios.
So batching order-0 folios in bigger slots that match the FS BS (e.g. 16
KiB) to perform disk writes, right?
Batching might be one idea, but the first idea I raised here would be
that the swap slot size will match the BS (e.g., 16 KiB) and contain at
most one folio.
So a order-0 folio would get a single slot assigned and effectively
"waste" 12 KiB of disk space.
An order-2 folio would get a single slot assigned and not waste any memory.
An order-3 folio would get two slots assigned etc. (similar to how it is
done today for non-order-0 folios)
So the penalty for using small folios would be more wasted disk space on
such devices.
Can we also assign different orders
to the same slot?
I guess yes.
And can we batch folios while keeping alignment to the
BS (IU)?
I assume with "batching" you would mean that we could actually have
multiple folios inside a single BS, like up to 4 order-0 folios in a
single 16 KiB block? That might be one way of doing it, although I
suspect this can get a bit complicated.
IIUC, we can perform 4 KiB read/write, but we must only have a single
write per block, because otherwise we might get the RMW problems,
correct? Then, maybe a mechanism to guarantee that only a single swap
writeback within a BS can happen at one point in time might also be an
alternative.
If we stick to 4 KiB swap slots, maybe pageout() could be taught to
effectively writeback "everything" residing in the relevant swap slots that
span a BS?
I recall there was a discussion about atomic writes involving multiple
pages, and how it is hard. Maybe with swaping it is "easier"? Absolutely no
expert on that, unfortunately. Hoping Chris has some ideas.
Not sure about the discussion but I guess the main concern for atomic
and swaping is the alignment and the questions I raised above.
Yes, I think that's similar.
I recall that we have been talking about a better swap abstraction for years
:)
Adding Chris Li to the cc list in case he has more input.
Might be a good topic for LSF/MM (might or might not be a better place than
the MM alignment session).
Both options work for me. LSF/MM is in 12 weeks so, having a previous
session would be great.
Both work for me.
Can we start by scheduling this topic for the next available MM session?
Would be great to get initial feedback/thoughts/concerns, etc while we
keep this thread going on.
Yeah, it would probably great to present the problem and the exact
constraints we have (e.g., things stupid me asks above regarding actual
sizes in which we can perform reads and writes), so we can discuss
possible solutions.
@David R., is the slot in two weeks already taken?
--
Cheers,
David / dhildenb