Hi Niklas, > > > 100MB/s is sequential, your scrubbing is random. afaik everything is > random. > > Is there any docs that explain this, any code, or other definitive > answer? do a fio[1] test on a disk to see how it performs under certain conditions. Or look at atop during scrubbing, it will give you an impression how many % of your disk performance is used. > Also wouldn't it make sense that for scrubbing to be able to read the > disk linearly, at least to some significant extent? I would also think so, but I have no idea how this is implemented. > >> Changing scrubbing settings does not help (see below). > >> > > > > I think you should be able to use the full performance of the disk > when > > ceph tell osd.* injectargs '--osd_max_scrubs=X'. > > In my post I already showed that increasing `osd_max_scrubs` e.g. by 3x > does not help. > > Also, what would be the logic how it could? I would argue. Because an individual scrub is not using all the disk resources. When you allow 2 scrub sessions on the same disk, it uses 2x the ios, which of course would be at the costs of available client io. > If random IO is thrashing disk seeks, how could querying more concurrent > disk seeks help? it is, but 1 single scrub session is not taking all of your disk io. None of the recovery procedures do, afaik. Because the cluster likes to serve client io first. The larger the cluster, the more often some part of the cluster is doing recovery. > > ceph tell osd.* injectargs '--osd_recovery_sleep_hdd=0.100000' > > There is no recovery going on in the cluster. Yes I know, but this is a throttling factor, maybe something like this exists for scrubbing. The question you should ask yourself, why you want to change/investigate this? I like also to have a good performing cluster, but never looked at the scrubbing. Except turning it off before a reboot/update or so. [1] [global] ioengine=libaio #ioengine=posixaio invalidate=1 ramp_time=30 iodepth=1 runtime=180 time_based direct=1 filename=/dev/sdX #filename=/mnt/disk/fio-bench.img [write-4k-seq] stonewall bs=4k rw=write [randwrite-4k-seq] stonewall bs=4k rw=randwrite fsync=1 [read-4k-seq] stonewall bs=4k rw=read [randread-4k-seq] stonewall bs=4k rw=randread fsync=1 [rw-4k-seq] stonewall bs=4k rw=rw [randrw-4k-seq] stonewall bs=4k rw=randrw [randrw-4k-d4-seq] stonewall bs=4k rw=randrw iodepth=4 [randread-4k-d32-seq] stonewall bs=4k rw=randread iodepth=32 [randwrite-4k-d32-seq] stonewall bs=4k rw=randwrite iodepth=32 [write-128k-seq] stonewall bs=128k rw=write [randwrite-128k-seq] stonewall bs=128k rw=randwrite [read-128k-seq] stonewall bs=128k rw=read [randread-128k-seq] stonewall bs=128k rw=randread [rw-128k-seq] stonewall bs=128k rw=rw [randrw-128k-seq] stonewall bs=128k rw=randrw [write-1024k-seq] stonewall bs=1024k rw=write [randwrite-1024k-seq] stonewall bs=1024k rw=randwrite [read-1024k-seq] stonewall bs=1024k rw=read [randread-1024k-seq] stonewall bs=1024k rw=randread [rw-1024k-seq] stonewall bs=1024k rw=rw [randrw-1024k-seq] stonewall bs=1024k rw=randrw [write-4096k-seq] stonewall bs=4096k rw=write [write-4096k-d16-seq] stonewall bs=4M rw=write iodepth=16 [randwrite-4096k-seq] stonewall bs=4096k rw=randwrite [read-4096k-seq] stonewall bs=4096k rw=read [read-4096k-d16-seq] stonewall bs=4M rw=read iodepth=16 [randread-4096k-seq] stonewall bs=4096k rw=randread [rw-4096k-seq] stonewall bs=4096k rw=rw [randrw-4096k-seq] stonewall bs=4096k rw=randrw _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx