On Thu, Feb 27, 2025 at 01:35:32PM +0900, Sergey Senozhatsky wrote: > Current object mapping API is a little cumbersome. First, it's > inconsistent, sometimes it returns with page-faults disabled and > sometimes with page-faults enabled. Second, and most importantly, > it enforces atomicity restrictions on its users. zs_map_object() > has to return a liner object address which is not always possible > because some objects span multiple physical (non-contiguous) > pages. For such objects zsmalloc uses a per-CPU buffer to which > object's data is copied before a pointer to that per-CPU buffer > is returned back to the caller. This leads to another, final, > issue - extra memcpy(). Since the caller gets a pointer to > per-CPU buffer it can memcpy() data only to that buffer, and > during zs_unmap_object() zsmalloc will memcpy() from that per-CPU > buffer to physical pages that object in question spans across. > > New API splits functions by access mode: > - zs_obj_read_begin(handle, local_copy) > Returns a pointer to handle memory. For objects that span two > physical pages a local_copy buffer is used to store object's > data before the address is returned to the caller. Otherwise > the object's page is kmap_local mapped directly. > > - zs_obj_read_end(handle, buf) > Unmaps the page if it was kmap_local mapped by zs_obj_read_begin(). > > - zs_obj_write(handle, buf, len) > Copies len-bytes from compression buffer to handle memory > (takes care of objects that span two pages). This does not > need any additional (e.g. per-CPU) buffers and writes the data > directly to zsmalloc pool pages. > > In terms of performance, on a synthetic and completely reproducible > test that allocates fixed number of objects of fixed sizes and > iterates over those objects, first mapping in RO then in RW mode: > > OLD API > ======= > > 3 first results out of 10 > > 369,205,778 instructions # 0.80 insn per cycle > 40,467,926 branches # 113.732 M/sec > > 369,002,122 instructions # 0.62 insn per cycle > 40,426,145 branches # 189.361 M/sec > > 369,036,706 instructions # 0.63 insn per cycle > 40,430,860 branches # 204.105 M/sec > > [..] > > NEW API > ======= > > 3 first results out of 10 > > 265,799,293 instructions # 0.51 insn per cycle > 29,834,567 branches # 170.281 M/sec > > 265,765,970 instructions # 0.55 insn per cycle > 29,829,019 branches # 161.602 M/sec > > 265,764,702 instructions # 0.51 insn per cycle > 29,828,015 branches # 189.677 M/sec > > [..] > > T-test on all 10 runs > ===================== > > Difference at 95.0% confidence > -1.03219e+08 +/- 55308.7 > -27.9705% +/- 0.0149878% > (Student's t, pooled s = 58864.4) > > The old API will stay around until the remaining users switch > to the new one. After that we'll also remove zsmalloc per-CPU > buffer and CPU hotplug handling. > > The split of map(RO) and map(WO) into read_{begin/end}/write is > suggested by Yosry Ahmed. > > Suggested-by: Yosry Ahmed <yosry.ahmed@xxxxxxxxx> > Signed-off-by: Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx> I see my Reviewed-by was removed at some point. Did something change in this patch (do I need to review it again) or was it just lost?