On (22/11/02 14:06), Minchan Kim wrote: > On Tue, Oct 18, 2022 at 01:55:28PM +0900, Sergey Senozhatsky wrote: > > Allow zram to recompress (using secondary compression streams) > > pages. We support three modes: > > > > 1) IDLE pages recompression is activated by `idle` mode > > > > echo idle > /sys/block/zram0/recompress > > > > 2) Since there may be many idle pages user-space may pass a size > > watermark value (in bytes) and we will recompress IDLE pages only > > of equal or greater size: > > > > echo 888 > /sys/block/zram0/recompress > > Hmm, how about having seperate knob for threshold? Per-my understanding this threshold can change quite often, depending on memory pressure and so on. So we may force user-space to issues more syscalls, without any gain in simplicity. > recompress_threshold > > With that, we could make rescompress 888 and idle/huge > as well as only 888. > > echo 888 > /sys/block/zram0/recompress_threshold > echo 1 > /sys/block/zram0/recompress > > or > > echo 888 > /sys/block/zram0/recompress_threshold > echo idle > /sys/block/zram0/recompress > > or we can introduce the threshold with action item. > > echo "idle 888" > /sys/block/zram0/recompress > echo "huge 888" > /sys/block/zram0/recompress > echo "normal 888" > /sys/block/zram0/recompress I like the latter one, when threshold is an optional argument. I probably would even go a bit further and add keywords: type=STRING threshold=INT > > 3) HUGE pages recompression is activated by `huge` mode > > > > echo huge > /sys/block/zram0/recompress > > > > 4) HUGE_IDLE pages recompression is activated by `huge_idle` mode > > > > echo huge_idle > /sys/block/zram0/recompress > > > > Signed-off-by: Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx> > > --- > > drivers/block/zram/Kconfig | 15 +++ > > drivers/block/zram/zram_drv.c | 196 +++++++++++++++++++++++++++++++++- > > drivers/block/zram/zram_drv.h | 2 + > > 3 files changed, 210 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig > > index d4100b0c083e..3e00656a6f8a 100644 > > --- a/drivers/block/zram/Kconfig > > +++ b/drivers/block/zram/Kconfig > > @@ -78,3 +78,18 @@ config ZRAM_MEMORY_TRACKING > > /sys/kernel/debug/zram/zramX/block_state. > > > > See Documentation/admin-guide/blockdev/zram.rst for more information. > > + > > +config ZRAM_MULTI_COMP > > + bool "Enable multiple per-CPU compression streams" > > per-CPU is implementation detail. Let's do not mention it. OK. > > + depends on ZRAM > > + help > > + This will enable per-CPU multi-compression streams, so that ZRAM > > indentation OK. A question: does this matter? I don't see any problems in menuconfig. > > + can re-compress IDLE/huge pages, using a potentially slower but > > + more effective compression algorithm. Note, that IDLE page support > > + requires ZRAM_MEMORY_TRACKING. > > + > > + echo TIMEOUT > /sys/block/zramX/idle > > + echo SIZE > /sys/block/zramX/recompress > > + > > + SIZE (in bytes) parameter sets the object size watermark: idle > > + objects that are of a smaller size will not get recompressed. > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > > index 94c62d7ea818..da11560ecf70 100644 > > --- a/drivers/block/zram/zram_drv.c > > +++ b/drivers/block/zram/zram_drv.c > > @@ -1282,6 +1282,12 @@ static void zram_free_page(struct zram *zram, size_t index) > > atomic64_dec(&zram->stats.huge_pages); > > } > > > > + if (zram_test_flag(zram, index, ZRAM_RECOMP)) > > + zram_clear_flag(zram, index, ZRAM_RECOMP); > > + > > + if (zram_test_flag(zram, index, ZRAM_RECOMP_SKIP)) > > + zram_clear_flag(zram, index, ZRAM_RECOMP_SKIP); > > Let's squeeze the comp algo index into meta area since we have > some rooms for the bits. Then can we could remove the specific > recomp two flags? What is meta area? > I am thinking the feature to work incoming pages on the fly, > not only for recompress manual knob so it would be good > if we could make the interface abstract to work something > like this(I may miss something why we couldn't. I need more > time to look into then) > > zram_bvec_write: > > for (i = 0; i < MAX_COMP_ALGO; i++) { > zstrm = zcomp_stream_get(i); > zcomp_compress(src, &comp_len); > if (comp_len > threshold) { > zcomp_stream_put(i); > continue; > } > break; > } > > zram_bvec_read: > algo_idx = zram_get_algo_idx(zram, index); > zstrm = zcomp_stream_get(zram, algo_idx); > zcomp_decompress(zstrm); > zcomp_stream_put(zram, algo_idx); Hmm. This is something that should not be enabled by default. N compressions per every stored page is very very CPU and power intensive. We definitely want a way to have recompression as a user-space event, which gives all sorts of flexibility and extensibility. ZRAM doesn't (and should not) know about too many things, so ZRAM can't make good decisions (and probably should not try). User-space can make good decisions on the other hand. So recompression for us is not something that happens all the time, unconditionally. It's something that happens sometimes, depending on the situation on the host. [..] > > +static int zram_recompress(struct zram *zram, u32 index, struct page *page, > > + int size_watermark) > > +{ > > + unsigned long handle_prev; > > + unsigned long handle_next; > > + unsigned int comp_len_next; > > + unsigned int comp_len_prev; > > How about orig_handle and new_nandle with orig_comp_len and new_comp_len? No opinion. Can we have prev and next? :) [..] > > + if (sysfs_streq(buf, "idle")) { > > + mode = RECOMPRESS_IDLE; > > + } else if (sysfs_streq(buf, "huge")) { > > + mode = RECOMPRESS_HUGE; > > + } else if (sysfs_streq(buf, "huge_idle")) { > > + mode = RECOMPRESS_IDLE | RECOMPRESS_HUGE; > > + } else { > > + /* > > + * We will re-compress only idle objects equal or greater > > + * in size than watermark. > > + */ > > + ret = kstrtoint(buf, 10, &size_watermark); > > + if (ret) > > + return ret; > > + mode = RECOMPRESS_IDLE; > > + } > > + > > + if (size_watermark > PAGE_SIZE) > > nit: How about threshold instead of watermark? OK. MM uses watermarks everywhere so I just used the same term. Can change to threshold.