On Fri, Jun 16, 2023 at 1:37 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 16.06.23 10:04, Yosry Ahmed wrote: > > On Fri, Jun 16, 2023 at 12:57 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > >> > >> On 16.06.23 09:37, Yosry Ahmed wrote: > >>> On Thu, Jun 15, 2023 at 9:41 PM 贺中坤 <hezhongkun.hzk@xxxxxxxxxxxxx> wrote: > >>>> > >>>>> Thanks Fabian for tagging me. > >>>>> > >>>>> I am not familiar with #1, so I will speak to #2. Zhongkun, There are > >>>>> a few parts that I do not understand -- hopefully you can help me out > >>>>> here: > >>>>> > >>>>> (1) If I understand correctly in this patch we set the active memcg > >>>>> trying to charge any pages allocated in a zspage to the current memcg, > >>>>> yet that zspage will contain multiple compressed object slots, not > >>>>> just the one used by this memcg. Aren't we overcharging the memcg? > >>>>> Basically the first memcg that happens to allocate the zspage will pay > >>>>> for all the objects in this zspage, even after it stops using the > >>>>> zspage completely? > >>>> > >>>> It will not overcharge. As you said below, we are not using > >>>> __GFP_ACCOUNT and charging the compressed slots to the memcgs. > >>>> > >>>>> > >>>>> (2) Patch 3 seems to be charging the compressed slots to the memcgs, > >>>>> yet this patch is trying to charge the entire zspage. Aren't we double > >>>>> charging the zspage? I am guessing this isn't happening because (as > >>>>> Michal pointed out) we are not using __GFP_ACCOUNT here anyway, so > >>>>> this patch may be NOP, and the actual charging is coming from patch 3 > >>>>> only. > >>>> > >>>> YES, the actual charging is coming from patch 3. This patch just > >>>> delivers the BIO page's memcg to the current task which is not the > >>>> consumer. > >>>> > >>>>> > >>>>> (3) Zswap recently implemented per-memcg charging of compressed > >>>>> objects in a much simpler way. If your main interest is #2 (which is > >>>>> what I understand from the commit log), it seems like zswap might be > >>>>> providing this already? Why can't you use zswap? Is it the fact that > >>>>> zswap requires a backing swapfile? > >>>> > >>>> Thanks for your reply and review. Yes, the zswap requires a backing > >>>> swapfile. The I/O path is very complex, sometimes it will throttle the > >>>> whole system if some resources are short , so we hope to use zram. > >>> > >>> Is the only problem with zswap for you the requirement of a backing swapfile? > >>> > >>> If yes, I am in the early stages of developing a solution to make > >>> zswap work without a backing swapfile. This was discussed in LSF/MM > >>> [1]. Would this make zswap usable in for your use case? > >> > >> Out of curiosity, are there any other known pros/cons when using > >> zswap-without-swap instead of zram? > >> > >> I know that zram requires sizing (size of the virtual block device) and > >> consumes metadata, zswap doesn't. > > > > We don't use zram in our data centers so I am not an expert about > > zram, but off the top of my head there are a few more advantages to > > zswap: > > Thanks! > > > (1) Better memcg support (which this series is attempting to address > > in zram, although in a much more complicated way). > > Right. I think this patch also misses to update apply the charging in the recompress > case. (only triggered by user space IIUC) > > > > > (2) We internally have incompressible memory handling on top of zswap, > > which is something that we would like to upstream when > > zswap-without-swap is supported. Basically if a page does not compress > > well enough to save memory we reject it from zswap and make it > > unevictable (if there is no backing swapfile). The existence of zswap > > in the MM layer helps with this. Since zram is a block device from the > > MM perspective, it's more difficult to do something like this. > > Incompressible pages just sit in zram AFAICT. > > I see. With ZRAM_HUGE we still have to store the uncompressed page > (because, it's a block device and has to hold that data). Right. > > > > > (3) Writeback support. If you're running out of memory to store > > compressed pages you can add a swapfile in runtime and zswap will > > start writing to it freeing up space to compress more pages. This > > wouldn't be possible in the same way in zram. Zram supports writing to > > a backing device but in a more manual way (userspace has to write to > > an interface to tell zram to write some pages). > > Right, that zram backing device stuff is really sub-optimal and only useful > in corner cases (most probably not datacenters). > > What one can do with zram is to add a second swap device with lower priority. > Looking at my Fedora machine: > > $ cat /proc/swaps > Filename Type Size Used Priority > /dev/dm-2 partition 16588796 0 -2 > /dev/zram0 partition 8388604 0 100 > > > Guess the difference here is that you won't be writing out the compressed > data to the disk, but anything the gets swapped out afterwards will > end up on the disk. I can see how the zswap behavior might be better in that case > (instead of swapping out some additional pages you relocate the > already-swapped-out-to-zswap pages to the disk). Yeah I am hoping we can enable the use of zswap without a backing swapfile, and I keep seeing use cases that would benefit from that. > > -- > Cheers, > > David / dhildenb >