Hello dear ceph developers and users, I've spent some time tuning and measuring our ceph cluster performance, and noticed quite strange thing.. I've been using fio (using both rbd engine on hosts and direct block (aio) engine inside qemu-kvm guests (qemu connected to ceph storage using rbd)) and I noticed client part always generates huge amount of CPU load and therefore CLIENT seems to be the bottleneck. For example, when I measure direct SSD performance on one of ceph OSDs, I'm getting 100k IOPS (which is OK, according to SSD specs) using fio, but when I measure performance of ceph SSD pool volume, it's much worse. I'd understand, if the bottlenect would be ceph-osd processes (or some other ceph component), but it seems to me fio using rbd engine is the problem here (it's able to eat 6 CPU cores itself). Seems to be very similar when using qemu to access the ceph storage - it shows very high cpu utilisation (i'm using virtio-scsi for guest disk emulation). This behaviour is for both random and sequential IO. preloading libtcmalloc helps fio (and I also tried compiling qemu with libtcmallc, it also helps), but still it seems to me that there could be something wrong in librbd.. Has anyone else noticed this behaviour? I noticed on some mail threads, that disabling cephx authentication can help a lot, but I don't really like this idea and haven't tried it yet.. with best regards nik -- ------------------------------------- Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28.rijna 168, 709 00 Ostrava tel.: +420 591 166 214 fax: +420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: servis@xxxxxxxxxxx -------------------------------------
Attachment:
pgp9tozkTXc8k.pgp
Description: PGP signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com