On (25/02/07 21:09), Yosry Ahmed wrote: > Can we do some perf testing to make sure this custom locking is not > regressing performance (selfishly I'd like some zswap testing too)? So for zsmalloc I (usually) write some simple testing code which is triggered via sysfs (device attr) and that is completely reproducible, so that I compares apples to apples. In this particular case I just have a loop that creates objects (we don't need to compress or decompress anything, zsmalloc doesn't really care) - echo 1 > /sys/ ... / test_prepare for (sz = 32; sz < PAGE_SIZE; sz += 64) { for (i = 0; i < 4096; i++) { ent->handle = zs_malloc(zram->mem_pool, sz) list_add(ent) } } And now I just `perf stat` writes: - perf stat echo 1 > /sys/ ... / test_exec_old list_for_each_entry zs_map_object(ent->handle, ZS_MM_RO); zs_unmap_object(ent->handle) list_for_each_entry dst = zs_map_object(ent->handle, ZS_MM_WO); memcpy(dst, tmpbuf, ent->sz) zs_unmap_object(ent->handle) - perf stat echo 1 > /sys/ ... / test_exec_new list_for_each_entry dst = zs_obj_read_begin(ent->handle, loc); zs_obj_read_end(ent->handle, dst); list_for_each_entry zs_obj_write(ent->handle, tmpbuf, ent->sz); - echo 1 > /sys/ ... / test_finish free all handles and ent-s The nice part is that we don't depend on any of the upper layers, we don't even need to compress/decompress anything; we allocate objects of required sizes and memcpy static data there (zsmalloc doesn't have any opinion on that) and that's pretty much it. OLD API ======= 10 runs 369,205,778 instructions # 0.80 insn per cycle 40,467,926 branches # 113.732 M/sec 369,002,122 instructions # 0.62 insn per cycle 40,426,145 branches # 189.361 M/sec 369,051,170 instructions # 0.45 insn per cycle 40,434,677 branches # 157.574 M/sec 369,014,522 instructions # 0.63 insn per cycle 40,427,754 branches # 201.464 M/sec 369,019,179 instructions # 0.64 insn per cycle 40,429,327 branches # 198.321 M/sec 368,973,095 instructions # 0.64 insn per cycle 40,419,245 branches # 234.210 M/sec 368,950,705 instructions # 0.64 insn per cycle 40,414,305 branches # 231.460 M/sec 369,041,288 instructions # 0.46 insn per cycle 40,432,599 branches # 155.576 M/sec 368,964,080 instructions # 0.67 insn per cycle 40,417,025 branches # 245.665 M/sec 369,036,706 instructions # 0.63 insn per cycle 40,430,860 branches # 204.105 M/sec NEW API ======= 10 runs 265,799,293 instructions # 0.51 insn per cycle 29,834,567 branches # 170.281 M/sec 265,765,970 instructions # 0.55 insn per cycle 29,829,019 branches # 161.602 M/sec 265,764,702 instructions # 0.51 insn per cycle 29,828,015 branches # 189.677 M/sec 265,836,506 instructions # 0.38 insn per cycle 29,840,650 branches # 124.237 M/sec 265,836,061 instructions # 0.36 insn per cycle 29,842,285 branches # 137.670 M/sec 265,887,080 instructions # 0.37 insn per cycle 29,852,881 branches # 126.060 M/sec 265,769,869 instructions # 0.57 insn per cycle 29,829,873 branches # 210.157 M/sec 265,803,732 instructions # 0.58 insn per cycle 29,835,391 branches # 186.940 M/sec 265,766,624 instructions # 0.58 insn per cycle 29,827,537 branches # 212.609 M/sec 265,843,597 instructions # 0.57 insn per cycle 29,843,650 branches # 171.877 M/sec x old-api-insn + new-api-insn +-------------------------------------------------------------------------------------+ |+ x| |+ x| |+ x| |+ x| |+ x| |+ x| |+ x| |+ x| |+ x| |+ x| |A A| +-------------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 3.689507e+08 3.6920578e+08 3.6901918e+08 3.6902586e+08 71765.519 + 10 2.657647e+08 2.6588708e+08 2.6580373e+08 2.6580734e+08 42187.024 Difference at 95.0% confidence -1.03219e+08 +/- 55308.7 -27.9705% +/- 0.0149878% (Student's t, pooled s = 58864.4) > Perhaps Kairui can help with that since he was already testing this > series. Yeah, would be great.