Hi, Another iteration of the SMO patch set implementing suggestions from Al and Willy on the last version as well as some feedback from comments on the recent LWN article. Applies on top of Linus' tree (tag: v5.1-rc4). This is a patch set implementing movable objects within the SLUB allocator. This is work based on Christopher Lameter's patch set: https://lore.kernel.org/patchwork/project/lkml/list/?series=377335 The original code logic is from that set and implemented by Christopher. Clean up, refactoring, documentation, and additional features by myself. Responsibility for any bugs remaining falls solely with myself. Patch #9 has changes to the XArray migration function as suggested by Matthew, thank you. The only other changes to this version are to the dcache code. dcache ------ It was noted on LWN that calling the dcache migration function 'd_migrate' is a misnomer because we are _not_ trying to migrate the dentry objects but rather only free them. As noted by Al dentry (and inode) objects are inherently not relocatable. What we are trying to achieve here is, rather, to attempt to free a select group of dentry objects. The dcache patches are not intended to be a silver bullet fixing all fragmentation within the dentry slab cache. Instead we are trying to make a non-invasive attempt at freeing up pages sparsely used by the dentry slab cache. This may be useful for a number of reasons e.g. we _may_ be able to free a page that is stopping high order page allocations. This would be a useful capability. Since this is only something that _may_ help the aim is to be non-intrusive. This version of the set adds a config option to selectively build in the SMO stuff for the dcache. Without this option the only change this set makes to the dcache is adding a constructor. With the constructor doing a spinlock_init() it is hoped this will at best be a performance gain and at worst NOT be a performance reduction. Benchmarking has found this to be the case, results are included below. Patch #14 and #15 can be rolled into a single patch if #15 is found favourable. Changes since v2: - Improve the XArray migration function (thanks Matthew) - Fix the dcache constructor (thanks Alexander) - Rename the d_migrate function to d_partial_shrink (open to suggested improvement) - Totally re-write the dcache migration function based on schooling by Al Thanks for looking at this, Tobin. ============================= dcache SMO patch benchmarking ============================= Process ======= We use 5.1-rc4 as the baseline. We benchmark the SMO patchset with and without CONFIG_DCACHE_SMO. SMO patch set without CONFIG_DCACHE_SMO just adds a constructor to the dcache, no other code added to the build. Building with CONFIG_DCACHE_SMO adds code to enable object migration for the dcache. cmd = `time find / -name fname-no-exist` drop_caches = `cat 2 > /proc/sys/vm/drop_caches` 1. Boot system 2. Run $cmd 3. Run $drop_caches 4. Run $cmd Bare metal results ------------------ Machine: x86_64 Kernel configured with:: make defconfig - rc4 kernel (baseline):: time find / -name fname-no-exist dentry real 0m29.799s user 0m1.519s sys 0m10.825s echo 2 > /proc/sys/vm/drop_caches time find / -name fname-no-exist dentry real 0m6.828s user 0m0.952s sys 0m5.824s - rc4 kernel with SMO patch set and !CONFIG_DCACHE_SMO:: time find / -name fname-no-exist real 0m30.075s user 0m1.480s sys 0m10.754s echo 2 > /proc/sys/vm/drop_caches time find / -name fname-no-existproc/sys/vm/drop_caches real 0m6.626s user 0m0.917s sys 0m5.661s - rc4 kernel with SMO patch set and CONFIG_DCACHE_SMO:: time find / -name fname-no-exist dentry real 0m30.637s user 0m1.516s sys 0m11.603s echo 2 > /proc/sys/vm/drop_caches time find / -name fname-no-exist dentry real 0m6.886s user 0m0.932s sys 0m5.907s Qemu results ------------ Host machine: x86_64 Qemu kernel configured with:: make defconfig make kvmconfig Qemu invoked with:: qemu-system-x86_64 \ -enable-kvm \ -m 4G \ -hda arch.qcow \ -kernel $kernel \ -serial stdio \ -display none" \ -append 'root=/dev/sda1 console=ttyS0 rw' - rc4 kernel (baseline):: time find / -name fname-no-exist real 0m0.929s user 0m0.096s sys 0m0.168s echo 2 > /proc/sys/vm/drop_caches time find / -name fname-no-exist real 0m0.249s user 0m0.112s sys 0m0.133s - rc4 kernel with SMO patch set and !CONFIG_DCACHE_SMO:: time find / -name fname-no-exist real 0m1.018s user 0m0.095s sys 0m0.151s echo 2 > /proc/sys/vm/drop_caches time find / -name fname-no-exist real 0m0.191s user 0m0.083s sys 0m0.105s - rc4 kernel with SMO patch set and CONFIG_DCACHE_SMO:: time find / -name fname-no-exist real 0m0.763s user 0m0.091s sys 0m0.165s echo 2 > /proc/sys/vm/drop_caches time find / -name fname-no-exist real 0m0.192s user 0m0.062s sys 0m0.126s I am not very experienced with benchmarking, if this is grossly incorrect please do not hesitate to yell at me. Any suggestions on more/better benchmarking most appreciated. Thanks, Tobin. Tobin C. Harding (15): slub: Add isolate() and migrate() methods tools/vm/slabinfo: Add support for -C and -M options slub: Sort slab cache list slub: Slab defrag core tools/vm/slabinfo: Add remote node defrag ratio output tools/vm/slabinfo: Add defrag_used_ratio output tools/testing/slab: Add object migration test module tools/testing/slab: Add object migration test suite xarray: Implement migration function for objects tools/testing/slab: Add XArray movable objects tests slub: Enable moving objects to/from specific nodes slub: Enable balancing slabs across nodes dcache: Provide a dentry constructor dcache: Implement partial shrink via Slab Movable Objects dcache: Add CONFIG_DCACHE_SMO Documentation/ABI/testing/sysfs-kernel-slab | 14 + fs/dcache.c | 106 ++- include/linux/slab.h | 71 ++ include/linux/slub_def.h | 10 + lib/radix-tree.c | 13 + lib/xarray.c | 49 ++ mm/Kconfig | 14 + mm/slab_common.c | 2 +- mm/slub.c | 819 ++++++++++++++++++-- tools/testing/slab/Makefile | 10 + tools/testing/slab/slub_defrag.c | 567 ++++++++++++++ tools/testing/slab/slub_defrag.py | 451 +++++++++++ tools/testing/slab/slub_defrag_xarray.c | 211 +++++ tools/vm/slabinfo.c | 51 +- 14 files changed, 2295 insertions(+), 93 deletions(-) create mode 100644 tools/testing/slab/Makefile create mode 100644 tools/testing/slab/slub_defrag.c create mode 100755 tools/testing/slab/slub_defrag.py create mode 100644 tools/testing/slab/slub_defrag_xarray.c -- 2.21.0