Hi Yosry, On Wed, Mar 01, 2023 at 04:30:22PM -0800, Yosry Ahmed wrote: > > Can you provide a bit more detail? I am curious how this swap id > > maps into the swap_desc? Is the swp_entry_t cast into "struct > > swap_desc*" or going through some lookup table/tree? > > swap id would be an index in a radix tree (aka xarray), which contains > a pointer to the swap_desc struct. This lookup should be free with > this design as we also use swap_desc to directly store the swap cache > pointer, so this lookup essentially replaces the swap cache lookup. Thanks for the additional clarification. If you don't mind, I have some follow up questions. Is this radix tree global or has multiple small trees (e.g per swap device)? > > > as our abstraction layer. All MM code not concerned with swapping > > > details would operate in terms of swap descs. The swap_desc can point > > > to either a normal swap entry (associated with a swapfile) or a zswap > > > entry. It can also include all non-backend specific operations, such > > > as the swapcache (which would be a simple pointer in swap_desc), swap > > > > Does the zswap entry still use the swap slot cache and swap_info_struct? > > In this design no, it shouldn't. So the zswap entry only shares the swap cache with normal swap entry. That help me paint a better picture how you are going to do the indirection layers. > That's what I could think of at this point. My idea was something like this: > > struct swap_desc { > union { /* Use one bit to distinguish them */ > swp_entry_t swap_entry; > struct zswap_entry *zswap_entry; > }; > struct folio *swapcache; > atomic_t swap_count; > u32 id; > } > > Having the id in the swap_desc is convenient as we can directly map > the swap_desc to a swp_entry_t to place in the page tables, but I > don't think it's necessary. Without it, the struct size is 20 bytes, > so I think the extra 4 bytes are okay to use anyway if the slab > allocator only allocates multiples of 8 bytes. The whole complexity of the swap_count continues is trying to save a few bytes one swap entry has one low count numbers. This seems much more heavy weight compare to that. > The idea here is to unify the swapcache and swap_count implementation > between different swap backends (swapfiles, zswap, etc), which would > create a better abstraction and reduce reinventing the wheel. Same goal here. I am just trying to find ways to use less memory, for users who don't use the indirection. > Keep in mind that the current overhead is 1 byte O(max swap pages) not > O(swapped). Also, 1 byte is assuming we do not use the swap > continuation pages. If we do, it may end up being more. We also > allocate continuation in full 4k pages, so even if one swap_map > element in a page requires continuation, we will allocate an entire > page. What I am trying to say is that to get an actual comparison you > need to also factor in the swap utilization and the rate of usage of > swap continuation. I don't know how to come up with a formula for this > tbh. I would consider two extreme cases of memory usage first. 1) Have swap file size N and no swapping at all. 2) The swap file is full. (aka per page swapping memory overhead). Your proposal will likely do well on 1) because it is dynamic allocated. but worse in 2) due to the extra 20 or so bytes per swap_desc. > Also, like Johannes said, the worst case overhead (32 bytes if you > count the reverse mapping) is 0.8% of swapped memory, aka 8M for every > 1G swapped. It doesn't sound *very* bad. I understand that it is pure > overhead for people not using zswap, but it is not very awful. I might have an alternative to avoid increasing memory usage if not use the zswap. > > It seems what you really need is one bit of information to indicate > > this page is backed by zswap. Then you can have a seperate pointer > > for the zswap entry. > > If you use one bit in swp_entry_t (or one of the available swap types) > to indicate whether the page is backed with a swapfile or zswap it > doesn't really work. We lose the indirection layer. How do we move the > page from zswap to swapfile? We need to go update the page tables and > the shmem page cache, similar to swapoff. How about I make a proposal and you can help me poke holes on it? I fail to see how it can't move a page from zswap to swapfile, yet. Chris