On Mon, Mar 07, 2022 at 05:27:24PM +0300, Ananda wrote: > From: Ananda Badmaev <a.badmaev@xxxxxxxxxxxx> > > Ztree stores integer number of compressed objects per ztree block. > These blocks consist of several physical pages (from 1 to 8) and are > arranged in trees. > The range from 0 to PAGE_SIZE is divided into the number of intervals > corresponding to the number of trees and each tree only operates objects of > size from its interval. Thus the block trees are isolated from each other, > which makes it possible to simultaneously perform actions with several > objects from different trees. > Blocks make it possible to densely arrange objects of various sizes > resulting in low internal fragmentation. Also this allocator tries to fill > incomplete blocks instead of adding new ones thus in many cases providing a > compression ratio substantially higher than z3fold and zbud. > Apart from greater flexibility, ztree is significantly superior to other > zpool backends with regard to the worst execution times, thus allowing for > better response time and real-time characteristics of the whole system. > > Signed-off-by: Ananda Badmaev <a.badmaev@xxxxxxxxxxxx> > --- > > v2: fixed compiler warnings > > v3: added documentation and const modifier to struct tree_descr > > Documentation/vm/ztree.rst | 104 +++++ > MAINTAINERS | 7 + > mm/Kconfig | 18 + > mm/Makefile | 1 + > mm/ztree.c | 754 +++++++++++++++++++++++++++++++++++++ > 5 files changed, 884 insertions(+) > create mode 100644 Documentation/vm/ztree.rst > create mode 100644 mm/ztree.c There are a lot of style issues, please run scripts/checkpatch.pl. > diff --git a/Documentation/vm/ztree.rst b/Documentation/vm/ztree.rst > new file mode 100644 > index 000000000000..78cad0a6d616 > --- /dev/null > +++ b/Documentation/vm/ztree.rst > @@ -0,0 +1,104 @@ > +.. _ztree: > + > +===== > +ztree > +===== > + > +Ztree stores integer number of compressed objects per ztree block. These > +blocks consist of several consecutive physical pages (from 1 to 8) and > +are arranged in trees. The range from 0 to PAGE_SIZE is divided into the > +number of intervals corresponding to the number of trees and each tree > +only operates objects of size from its interval. Thus the block trees are > +isolated from each other, which makes it possible to simultaneously > +perform actions with several objects from different trees. > + > +Blocks make it possible to densely arrange objects of various sizes > +resulting in low internal fragmentation. Also this allocator tries to fill > +incomplete blocks instead of adding new ones thus in many cases providing > +a compression ratio substantially higher than z3fold and zbud. Apart from > +greater flexibility, ztree is significantly superior to other zpool > +backends with regard to the worst execution times, thus allowing for better > +response time and real-time characteristics of the whole system. > + > +Like z3fold and zsmalloc ztree_alloc() does not return a dereferenceable > +pointer. Instead, it returns an unsigned long handle which encodes actual > +location of the allocated object. > + > +Unlike others ztree works well with objects of various sizes - both highly > +compressed and poorly compressed including cases where both types are present. > + > +Tests > +===== I don't think the sections below belong to the Documentation. IMO they are more suitable to the changelog > + > +Test platform > +------------- > + > +Qemu arm64 virtual board with debian 11. > + > +Kernel > +------ > + > +Linux 5.17-rc6 with ztree and zram over zpool patch. Additionally, counters and > +time measurements using ktime_get_ns() have been added to ZPOOL API. > + > +Tools > +----- > + > +ZRAM disks of size 1000M/1500M/2G, fio 3.25. > + > +Test description > +---------------- > + > +Run 2 fio scripts in parallel - one with VALUE=50, other with VALUE=70. > +This emulates page content heterogeneity. > + > +fio --bs=4k --randrepeat=1 --randseed=100 --refill_buffers \ > + --scramble_buffers=1 --buffer_compress_percentage=VALUE \ > + --direct=1 --loops=1 --numjobs=1 --filename=/dev/zram0 \ > + --name=seq-write --rw=write --stonewall --name=seq-read \ > + --rw=read --stonewall --name=seq-readwrite --rw=rw --stonewall \ > + --name=rand-readwrite --rw=randrw --stonewall > + > +Results > +------- > + > +ztree > +~~~~~ > + > +* average malloc time (us): 3.8 > +* average free time (us): 3.1 > +* average map time (us): 4.5 > +* average unmap time (us): 1.2 > +* worst zpool op time (us): ~2200 > +* total zpool ops exceeding 1000 us: 29 > + > + > +zsmalloc > +~~~~~~~~ > + > +* average malloc time (us): 10.3 > +* average free time (us): 6.5 > +* average map time (us): 3.2 > +* average unmap time (us): 1.2 > +* worst zpool op time (us): ~6200 > +* total zpool ops exceeding 1000 us: 1031 > + > +z3fold > +~~~~~~ > + > +* average malloc time (us): 20.8 > +* average free time (us): 29.9 > +* average map time (us): 3.4 > +* average unmap time (us): 1.4 > +* worst zpool op time (us): ~4900 > +* total zpool ops exceeding 1000 us: 100 > + > +zbud > +~~~~ > + > +* average malloc time (us): 8.1 > +* average free time (us): 4.0 > +* average map time (us): 0.3 > +* average unmap time (us): 0.3 > +* worst zpool op time (us): ~9400 > +* total zpool ops exceeding 1000 us: 727 -- Sincerely yours, Mike.