I'm curious to hear if anyone has looked into kernel scheduling tweaks or changes in order to improve qd=1/bs=4k performance (while we patiently wait for Seastar!) Using this toy cluster: 3x OSD nodes: Atom C3758 (8 core, 2.2GHz) with 1x Intel S4500 SSD each Debian Bullseye with Linux 5.15.14 and Pacific 16.2.7 both compiled from source. I was able to get ~550 IOPS "out of the box" using the below command to benchmark. By "out of the box", I mean applying the *proven* optimizations like forcing the procs into C1, disabling p-states, etc. $ rbd bench --io-type write image01 -p testbench --io-threads=1 --io-size 4K --io-pattern rand --rbd_cache=false After looking at flamegraphs/perf/etc, I suspected some ping-ponging between some threads. Maybe this is due to the "low" number of cores on the toy cluster? I also observed some tunables like "ms async send inline" helped slightly, but are default off because they don't work well with high core counts. As more of a dire step, I partitioned the system into two cgroups; 6 cores for the OSDs and 2 cores for everything else ("junk" cores, effectively) I should also mention that my kernel is compiled with NO_HZ and that I enabled it for the non-junk cores here, in thinking that maybe that some of the Ceph tasks could benefit from an opportunity to go tickless. In doing so, I also noticed a small improvement. But at this point, I was still only at maybe ~600-625 IOPS or so. I next assigned the OSD processes a SCHED_RR policy and wow... immediate jump to 800+ IOPS. For a cluster that consumes < 100W (switch included), I am pretty pleased! I'm not really sure of the implications of having the OSD threads being in RR though, even though the cores are dedicated for that purpose... Anyways, has anyone looked at goofing around with scheduling parameters on something resembling a production-grade Ceph cluster? I'm curious to know if it helps -- it's probably no secret that bluestore's current "low" QD1 performance is hindered by the locks and context switching involved in the lifecycle of an IOP... maybe a few scheduler changes could help unlock a good chunk of performance? Cheers, Tyler _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx