On Mon, Nov 18, 2024 at 10:56 PM Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx> wrote: > > On (24/11/12 09:31), Barry Song wrote: > [..] > > > Do you have any data how this would perform with the upstream kernel, i.e. without > > > a large folio pool and the workaround and if large granularity compression is worth having > > > without those patches? > > > > I’d say large granularity compression isn’t a problem, but large > > granularity decompression > > could be. > > > > The worst case would be if we swap out a large block, such as 16KB, > > but end up swapping in > > 4 times due to allocation failures, falling back to smaller folios. In > > this scenario, we would need > > to perform three redundant decompressions. I will work with Tangquan > > to provide this data this > > week. > > Well, apart from that... I sort of don't know. > > This seems to be exclusively for swap case (or do file-systems use > mTHP too?) and zram/zsmalloc don't really focus on one particular > usage scenario, pretty much all of our features can be used regardless > of what zram is backing up - be it a swap partition or a mounted fs. > Yes, some filesystems also support mTHP. A simple grep command can list them all: fs % git grep mapping_set_large_folios afs/inode.c: mapping_set_large_folios(inode->i_mapping); afs/inode.c: mapping_set_large_folios(inode->i_mapping); bcachefs/fs.c: mapping_set_large_folios(inode->v.i_mapping); erofs/inode.c: mapping_set_large_folios(inode->i_mapping); nfs/inode.c: mapping_set_large_folios(inode->i_mapping); smb/client/inode.c: mapping_set_large_folios(inode->i_mapping); zonefs/super.c: mapping_set_large_folios(inode->i_mapping); more filesystems might begin to support large mapping. In the current implementation, only size is considered when determining whether to apply large block compression: static inline bool want_multi_pages_comp(struct zram *zram, struct bio *bio) { u32 index = bio->bi_iter.bi_sector >> SECTORS_PER_PAGE_SHIFT; if (bio->bi_io_vec->bv_len >= ZCOMP_MULTI_PAGES_SIZE) return true; ... } If we encounter too many corner cases with filesystems (such as excessive recompression or partial reads), we could also verify if the folio is anonymous to return true. For swap, we are working to get things under control. The challenging scenario that could lead to many partial reads arises when mTHP allocation fails during swap-in. In such cases, do_swap_page() will swap in only a single small folio, even after decompressing the entire 16KB. > Another thing is that I don't see how to integrate these large > objects support with post-processig: recompression and writeback. > Well, recompression is okay-ish, I guess, but writeback is not. > Writeback works in PAGE_SIZE units; we get that worst case scenario > here. So, yeah, there are many questions. For ZRAM writeback, my intuition is that we should write back the entire large block (4 * PAGE_SIZE) at once. If the large block is idle or marked as huge in ZRAM, it generally applies to the entire block. This isn't currently implemented, likely because writeback hasn't been enabled on our phones yet. > > p.s. Sorry for late reply. I just started looking at the series and > don't have any solid opinions yet. Thank you for starting to review the series. Your suggestions are greatly appreciated. Best Regards Barry