On Thu, Mar 2, 2023 at 2:36 PM Rik van Riel <riel@xxxxxxxxxxx> wrote: > > On Thu, 2023-03-02 at 13:42 -0800, Chris Li wrote: > > On Thu, Mar 02, 2023 at 01:23:14PM -0500, Rik van Riel wrote: > > > > > > > > > One possible implementation might be to have swap page table > > > entries > > > point to a swap address in this indirection layer, and the > > > indirection > > > layer can be an xarray containing the actual swap entries > > > specifying > > > at which position in which swap device the data can be found. > > > > The questions, do we have this indirection layer apply to all swap > > entries? > > > I believe we should have a system that tracks every swap entry > the same, data structure wise. Otherwise we will have two sets > of code in the kernel, and it will be too easy to get corner > cases wrong. > > > My small tweak is to limit the indirection layer only to non leaf > > swap devices. Then it is actually very close to what I am proposing. > > Just your "indirection layer" is my "special swap device". > > > > Again, "special swap device" is a very bad name, let's name it > > something > > more useful. > > > > > That might be a net reduction in the code over what we have today, > > > because it gets rid of some ugly corner cases. > > > > Great. > > ... but that won't happen if the indirection layer only applies > to some swap devices, because we will still need to keep around > the crazy code to deal with the swap devices that don't have it. I agree with Rik here. We can certainly special case the indirection layer and only apply to some swap backends (e.g. zswap), but this makes things more complicated. For example, if each swap backend maintains swap_count in their own way, we have to hand over the swap count when we move a swapped page between backends. With a common data structure like the proposed swap_desc, everything becomes easier to reason about. The core swapping logic that is agnostic to the backend like swapcache and swap counting lives in one common place and becomes easier to reason about. Swap backends like swapfiles or zswap can then implement a common interface to do backend specific operations, like allocating entries, reading/writing pages, etc. This, of course, isn't free. There is an associated overhead. It's a trade off like most things are. We want to work towards the outcome of that tradeoff that makes sense, we don't want to incur too much overhead, but we also don't want a very complicated and error-prone implementation. Rik, I am wondering about your thoughts on this proposal and how you think it can be improved? > > -- > All Rights Reversed.