On (22/11/03 10:00), Minchan Kim wrote: [..] > > Per-my understanding this threshold can change quite often, > > depending on memory pressure and so on. So we may force > > user-space to issues more syscalls, without any gain in > > simplicity. > > Sorry, didn't understand your point. Let me clarify my idea. > If we have separate knob for recompress thresh hold, we could > work like this. > > # recompress any compressed pages which is greater than 888 bytes. > echo 888 > /sys/block/zram0/recompress_threshold > > # try to compress any pages greather than threshold with following > # algorithm. > > echo "type=lzo priority=1" > /sys/block/zram0/recompress_algo > echo "type=zstd priority=2" > /sys/block/zram0/recompress_algo > echo "type=deflate priority=3" > /sys/block/zram0/recompress_algo OK. We can always add more sysfs knobs and make threshold a global per-device value. I think I prefer the approach when threshold is part of the current recompress context, not something derived form another context. That is, when all values (page type, threshold, possibly algorithm index) are submitted by user-space for this particular recompression echo "type=huge threshold=3000 ..." > recompress If threshold is a global value that is applied to all recompress calls then how does user-space say no-threshold? For instance, when it wants to recompress only huge pages. It probably still needs to supply something like threshold=0. So my personal preference for now - keep threshold as a context dependent value. Another thing that I like about threshold= being context dependent is that then we don't need to protect recompression against concurrent global threshold modifications with lock and so on. It keeps things simpler. [..] > > > Let's squeeze the comp algo index into meta area since we have > > > some rooms for the bits. Then can we could remove the specific > > > recomp two flags? > > > > What is meta area? > > zram->table[index].flags > > If we squeeze the algorithm index, we could work like this > without ZRAM_RECOMP_SKIP. We still need ZRAM_RECOMP_SKIP. Recompression may fail to compress object further: sometimes we can get recompressed object that is larger than the original one, sometimes of the same size, sometimes of a smaller size but still belonging to the same size class, which doesn't save us any memory. Without ZRAM_RECOMP_SKIP we will continue re-compressing objects that are in-compressible (in a way that saves us memory in zsmalloc) by any of the ZRAM's algorithms. > read_block_state > zram_algo_idx(zram, index) > 0 ? 'r' : '.'); > > zram_read_from_zpool > if (zram_algo_idx(zram, idx) != 0) > idx = 1; As an idea, maybe we can store everything re-compression related in a dedicated meta field? SKIP flag, algorithm ID, etc. We don't have too many bits left in ->flags on 32-bit systems. We currently probably need at least 3 bits - one for RECOMP_SKIP and at least 2 for algorithm ID. 2 bits for algorithm ID put us into situation that we can have only 00, 01, 10, 11 as IDs, that is maximum 3 recompress algorithms: 00 is the primary one and the rest are alternative ones. Maximum 3 re-compression algorithms sounds like a reasonable max value to me. Yeah, maybe we can use flags bits for it. [..] > > > zram_bvec_read: > > > algo_idx = zram_get_algo_idx(zram, index); > > > zstrm = zcomp_stream_get(zram, algo_idx); > > > zcomp_decompress(zstrm); > > > zcomp_stream_put(zram, algo_idx); > > > > Hmm. This is something that should not be enabled by default. > > Exactly. I don't mean to enable by default, either. OK. > > N compressions per every stored page is very very CPU and > > power intensive. We definitely want a way to have recompression > > as a user-space event, which gives all sorts of flexibility and > > extensibility. ZRAM doesn't (and should not) know about too many > > things, so ZRAM can't make good decisions (and probably should not > > try). User-space can make good decisions on the other hand. > > > > So recompression for us is not something that happens all the time, > > unconditionally. It's something that happens sometimes, depending on > > the situation on the host. > > Totally agree. I am not saying we should enable the feature by default > but at lesat consider it for the future. I have something in mind to > be useful later. OK. > > [..] > > > > +static int zram_recompress(struct zram *zram, u32 index, struct page *page, > > > > + int size_watermark) > > > > +{ > > > > + unsigned long handle_prev; > > > > + unsigned long handle_next; > > > > + unsigned int comp_len_next; > > > > + unsigned int comp_len_prev; > > > > > > How about orig_handle and new_nandle with orig_comp_len and new_comp_len? > > > > No opinion. Can we have prev and next? :) > > prev and next gives the impression position something like list. > orig and new gives the impression stale and fresh. > > We are doing latter here. Yeah, like I said in internal email, this will make rebasing harder on my side, because this breaks a patch from Alexey and then breaks a higher order zspages patch series. It's an very old series and we already have quite a bit of patches depending on it.