Strange krbd behaviour with queue depths

Nick Fisk <nick@xxxxxxxxxx> · Thu, 5 Mar 2015 17:17:27 -0000

I’m seeing a strange queue depth behaviour with a kernel mapped RBD, librbd does not show this problem.

Cluster is comprised of 4 nodes, 10GB networking, not including OSDs as test sample is small so fits in page cache.

Running fio against a kernel mapped RBD
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/dev/rbd/cache1/test2 --bs=4k  --readwrite=randread --iodepth=1 --runtime=10 --size=1g

Queue Depth IOPS
1 2021
2 288
4 376
8 601
16 1272
32 2467
64 16901
128 44060

See how initially I get a very high number of IOs at queue depth 1, but this drops dramatically as soon as I start increasing the queue depth. It’s not until a depth or 32 IOs that I start to get similar performance. Incidentally when changing the read type to sequential instead of random  the oddity goes away.

Running fio with librbd engine and the same test options I get the following

Queue Depth IOPS
1 1492
2 3232
4 7099
8 13875
16 18759
32 17998
64 18104
128 18589

As you can see the performance scales up nicely, although the top end IO’s seem limited to around 18k. I don’t know if this is due to kernel/userspace performance differences or if there is a lower max queue depth limit in librbd.

Both tests were run on a small sample size to force the OSD data into page cache to rule out any device latency.

Does anyone know why kernel mapped RBD’s show this weird behaviour? I don’t think it can be OSD/ceph config related as it only happens with krbd’s.

Nick

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com