Currently zram runs compression and decompression in non-preemptible sections, e.g. zcomp_stream_get() // grabs CPU local lock zcomp_compress() or zram_slot_lock() // grabs entry spin-lock zcomp_stream_get() // grabs CPU local lock zs_map_object() // grabs rwlock and CPU local lock zcomp_decompress() Potentially a little troublesome for a number of reasons. For instance, this makes it impossible to use async compression algorithms or/and H/W compression algorithms, which can wait for OP completion or resource availability. This also restricts what compression algorithms can do internally, for example, zstd can allocate internal state memory for C/D dictionaries: do_fsync() do_writepages() zram_bio_write() zram_write_page() // become non-preemptible zcomp_compress() zstd_compress() ZSTD_compress_usingCDict() ZSTD_compressBegin_usingCDict_internal() ZSTD_resetCCtx_usingCDict() ZSTD_resetCCtx_internal() zstd_custom_alloc() // memory allocation Not to mention that the system can be configured to maximize compression ratio at a cost of CPU/HW time (e.g. lz4hc or deflate with very high compression level) so zram can stay in non-preemptible section (even under spin-lock or/and rwlock) for an extended period of time. Aside from compression algorithms, this also restricts what zram can do. One particular example is zram_write_page() zsmalloc handle allocation, which has an optimistic allocation (disallowing direct reclaim) and a pessimistic fallback path, which then forces zram to compress the page one more time. This series changes zram to not directly impose atomicity restrictions on compression algorithms (and on itself), which makes zram write() fully preemptible; zram read(), sadly, is not always preemptible yet. There are still indirect atomicity restrictions imposed by zsmalloc(). One notable example is object mapping API, which returns with: a) local CPU lock held b) zspage rwlock held First, zsmalloc is converted to use sleepable RW-"lock" (it's atomic_t in fact) for zspage migration protection. Second, a new handle mapping is introduced which doesn't use per-CPU buffers (and hence no local CPU lock), does fewer memcpy() calls, but requires users to provide a pointer to temp buffer for object copy-in (when needed). Third, zram is converted to the new zsmalloc mapping API and thus zram read() becomes preemptible. v4 -> v5: - switched to preemptible per-CPU comp streams (Yosry) - switched to preemptible bit-locks for zram entry locking (Andrew) - added lockdep annotations to new zsmalloc/zram locks (Hillf, Yosry) - perf measurements - reworked re-compression loop (a bunch of minor fixes) - fixed potential physical page leaks on writeback/recompression error paths - documented new locking rules Sergey Senozhatsky (18): zram: sleepable entry locking zram: permit preemption with active compression stream zram: remove crypto include zram: remove max_comp_streams device attr zram: remove two-staged handle allocation zram: remove writestall zram_stats member zram: limit max recompress prio to num_active_comps zram: filter out recomp targets based on priority zram: rework recompression loop zsmalloc: factor out pool locking helpers zsmalloc: factor out size-class locking helpers zsmalloc: make zspage lock preemptible zsmalloc: introduce new object mapping API zram: switch to new zsmalloc object mapping API zram: permit reclaim in zstd custom allocator zram: do not leak page on recompress_store error path zram: do not leak page on writeback_store error path zram: add might_sleep to zcomp API Documentation/ABI/testing/sysfs-block-zram | 8 - Documentation/admin-guide/blockdev/zram.rst | 36 +- drivers/block/zram/backend_zstd.c | 11 +- drivers/block/zram/zcomp.c | 43 +- drivers/block/zram/zcomp.h | 8 +- drivers/block/zram/zram_drv.c | 286 +++++++------ drivers/block/zram/zram_drv.h | 22 +- include/linux/zsmalloc.h | 8 + mm/zsmalloc.c | 420 +++++++++++++++----- 9 files changed, 536 insertions(+), 306 deletions(-) -- 2.48.1.502.g6dc24dfdaf-goog