Hi Jens, guys, I am sending this as an RFC to see if there is any future in it or ideas on how to make better. I also need to improve some items (as mentioned in 2/2 commit message) and test a lot more. The general idea is that we change from allocating a single array of sbitmap words to allocating an sub-array per NUMA node. And then each CPU in that node is hinted to use that sub-array Initial performance looks decent. Some figures: System: 4-nodes (with memory on all nodes), 128 CPUs null blk config block: 20 devs, submit_queues=NR_CPUS, shared_tags, shared_tag_bitmap, hw_queue_depth=256 fio config: bs=4096, iodepth=128, numjobs=10, cpus_allowed_policy=split, rw=read, ioscheduler=none Before: 7130K After: 7630K So a +7% IOPS gain. Any comments welcome, thanks!. Based on v5.18-rc6. John Garry (2): sbitmap: Make sbitmap.map a double pointer sbitmap: Spread sbitmap word allocation over NUMA nodes include/linux/sbitmap.h | 16 +++++--- lib/sbitmap.c | 83 +++++++++++++++++++++++++++++++++-------- 2 files changed, 79 insertions(+), 20 deletions(-) -- 2.26.2