On Wed, Jan 8, 2025 at 12:36 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > >> Maybe the swapcache could somehow abstract that? We currently have the swap > >> slot allocator, that assigns slots to pages. > >> > >> Assuming we have a 16 KiB BS but a 4 KiB page, we might have various options > >> to explore. > >> > >> For example, we could size swap slots 16 KiB, and assign even 4 KiB pages a > >> single slot. This would waste swap space with small folios, that would go > >> away with large folios. > > > > So batching order-0 folios in bigger slots that match the FS BS (e.g. 16 > > KiB) to perform disk writes, right? > > Batching might be one idea, but the first idea I raised here would be > that the swap slot size will match the BS (e.g., 16 KiB) and contain at > most one folio. > > So a order-0 folio would get a single slot assigned and effectively > "waste" 12 KiB of disk space. I prefer not to "waste" that. It will be wasted on the write amplification as well. > > An order-2 folio would get a single slot assigned and not waste any memory. > > An order-3 folio would get two slots assigned etc. (similar to how it is > done today for non-order-0 folios) > > So the penalty for using small folios would be more wasted disk space on > such devices. > > Can we also assign different orders > > to the same slot? > > I guess yes. > > And can we batch folios while keeping alignment to the > > BS (IU)? > > I assume with "batching" you would mean that we could actually have > multiple folios inside a single BS, like up to 4 order-0 folios in a > single 16 KiB block? That might be one way of doing it, although I > suspect this can get a bit complicated. That would be my preference. BTW, another usage case is that if we want to write compressed swap entries into the SSD (to reduce the wear on SSD), we will also end up with a similar situation where we want to combine multiple swap entries into a write unit. > > IIUC, we can perform 4 KiB read/write, but we must only have a single > write per block, because otherwise we might get the RMW problems, > correct? Then, maybe a mechanism to guarantee that only a single swap > writeback within a BS can happen at one point in time might also be an > alternative. Yes, I do see that batching and grouping write of the swap entries is necessary and useful. > > > > >> > >> If we stick to 4 KiB swap slots, maybe pageout() could be taught to > >> effectively writeback "everything" residing in the relevant swap slots that > >> span a BS? > >> > >> I recall there was a discussion about atomic writes involving multiple > >> pages, and how it is hard. Maybe with swaping it is "easier"? Absolutely no > >> expert on that, unfortunately. Hoping Chris has some ideas. > > > > Not sure about the discussion but I guess the main concern for atomic > > and swaping is the alignment and the questions I raised above. > > Yes, I think that's similar. Agree, it is very much similar. It can share a single solution, the "virtual swapfile". That is my proposal. > > > > >> > >> > >>> > >>>> > >>>> I recall that we have been talking about a better swap abstraction for years > >>>> :) > >>> > >>> Adding Chris Li to the cc list in case he has more input. > >>> > >>>> > >>>> Might be a good topic for LSF/MM (might or might not be a better place than > >>>> the MM alignment session). > >>> > >>> Both options work for me. LSF/MM is in 12 weeks so, having a previous > >>> session would be great. > >> > >> Both work for me. > > > > Can we start by scheduling this topic for the next available MM session? > > Would be great to get initial feedback/thoughts/concerns, etc while we > > keep this thread going on. > > Yeah, it would probably great to present the problem and the exact > constraints we have (e.g., things stupid me asks above regarding actual > sizes in which we can perform reads and writes), so we can discuss > possible solutions. > > @David R., is the slot in two weeks already taken? Hopefully I can send out the "virtual swapfile" proposal in time and we can discuss that as one of the possible approaches. Chris > > -- > Cheers, > > David / dhildenb >