Hi Alexandre, But can we use aio=native when using librbd volume, or it will be plainly ignored by QEMU? (My understanding is that for networked volumes, like ceph, aio=native doesn't make a difference and it can only be used when using RAW disks). Thanks! Xavi -----Mensaje original----- De: Alexandre DERUMIER [mailto:aderumier@xxxxxxxxx] Enviado el: sábado, 11 de marzo de 2017 7:25 Para: Xavier Trilla <xavier.trilla@xxxxxxxxxxxxxxxx> CC: ceph-users <ceph-users@xxxxxxxxxxxxxx> Asunto: Re: Posix AIO vs libaio read performance >>Regarding rbd cache, is something I will try -today I was thinking about it- but I did not try it yet because I don't want to reduce write speed. note that rbd_cache only work for sequential writes. so it don't help for random writes. also, internaly, qemu force use of aio=threads with cache=writeback is enable, but can use aio=native with cache=none. ----- Mail original ----- De: "Xavier Trilla" <xavier.trilla@xxxxxxxxxxxxxxxx> À: "aderumier" <aderumier@xxxxxxxxx> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx> Envoyé: Vendredi 10 Mars 2017 14:12:59 Objet: Re: Posix AIO vs libaio read performance Hi Alexandre, Debugging is disabled in client and osds. Regarding rbd cache, is something I will try -today I was thinking about it- but I did not try it yet because I don't want to reduce write speed. I also tried iothreads, but no benefit. I tried as well with virtio-blk and virtio-scsi, there is a small improvement with virtio-blk, but it's around a 10%. This is becoming a quite strange issue, as it only affects posix aio read performance. Nothing less seems to be affected -although posix aio write isn't nowhere near libaio performance-. Thanks for you help, if you have any other ideas they will be really appreciated. Also if somebody could run in their cluster from inside a VM the following command: fio --name=randread-posix --output ./test --runtime 60 --ioengine=posixaio --buffered=0 --direct=1 --rw=randread --bs=4k --size=1024m --iodepth=32 It would be really helpful to know if I'm the only one affected or this is happening in all qemu + ceph setups. Thanks! Xavier El 10 mar 2017, a las 8:07, Alexandre DERUMIER < [ mailto:aderumier@xxxxxxxxx | aderumier@xxxxxxxxx ] > escribió: BQ_BEGIN BQ_BEGIN BQ_BEGIN But it still looks like there is some bottleneck in QEMU o Librbd I cannot manage to find. BQ_END BQ_END you can improve latency on client with disable debug. on your client, create a /etc/ceph/ceph.conf with [global] debug asok = 0/0 debug auth = 0/0 debug buffer = 0/0 debug client = 0/0 debug context = 0/0 debug crush = 0/0 debug filer = 0/0 debug filestore = 0/0 debug finisher = 0/0 debug heartbeatmap = 0/0 debug journal = 0/0 debug journaler = 0/0 debug lockdep = 0/0 debug mds = 0/0 debug mds balancer = 0/0 debug mds locker = 0/0 debug mds log = 0/0 debug mds log expire = 0/0 debug mds migrator = 0/0 debug mon = 0/0 debug monc = 0/0 debug ms = 0/0 debug objclass = 0/0 debug objectcacher = 0/0 debug objecter = 0/0 debug optracker = 0/0 debug osd = 0/0 debug paxos = 0/0 debug perfcounter = 0/0 debug rados = 0/0 debug rbd = 0/0 debug rgw = 0/0 debug throttle = 0/0 debug timer = 0/0 debug tp = 0/0 you can also disable rbd_cache=false or in qemu set cache=none. Using iothread on qemu drive should help a little bit too. ----- Mail original ----- De: "Xavier Trilla" < [ mailto:xavier.trilla@xxxxxxxxxxxxxxxx | xavier.trilla@xxxxxxxxxxxxxxxx ] > À: "ceph-users" < [ mailto:ceph-users@xxxxxxxxxxxxxx | ceph-users@xxxxxxxxxxxxxx ] > Envoyé: Vendredi 10 Mars 2017 05:37:01 Objet: Re: Posix AIO vs libaio read performance Hi, We compiled Hammer .10 to use jemalloc and now the cluster performance improved a lot, but POSIX AIO operations are still quite slower than libaio. Now with a single thread read operations are about 1000 per second and write operations about 5000 per second. Using same FIO configuration, but libaio read operations are about 15K per second and writes 12K per second. I’m compiling QEMU with jemalloc support as well, and I’m planning to replace librbd in QEMU hosts to the new one using jemalloc. But it still looks like there is some bottleneck in QEMU o Librbd I cannot manage to find. Any help will be much appreciated. Thanks. De: ceph-users [ [ mailto:ceph-users-bounces@xxxxxxxxxxxxxx | mailto:ceph-users-bounces@xxxxxxxxxxxxxx ] ] En nombre de Xavier Trilla Enviado el: jueves, 9 de marzo de 2017 6:56 Para: [ mailto:ceph-users@xxxxxxxxxxxxxx | ceph-users@xxxxxxxxxxxxxx ] Asunto: Posix AIO vs libaio read performance Hi, I’m trying to debut why there is a big difference using POSIX AIO and libaio when performing read tests from inside a VM using librbd. The results I’m getting using FIO are: POSIX AIO Read: Type: Random Read - IO Engine: POSIX AIO - Buffered: No - Direct: Yes - Block Size: 4KB - Disk Target: /: Average: 2.54 MB/s Average: 632 IOPS Libaio Read: Type: Random Read - IO Engine: Libaio - Buffered: No - Direct: Yes - Block Size: 4KB - Disk Target: /: Average: 147.88 MB/s Average: 36967 IOPS When performing writes the differences aren’t so big, because the cluster –which is in production right now- is CPU bonded: POSIX AIO Write: Type: Random Write - IO Engine: POSIX AIO - Buffered: No - Direct: Yes - Block Size: 4KB - Disk Target: /: Average: 14.87 MB/s Average: 3713 IOPS Libaio Write: Type: Random Write - IO Engine: Libaio - Buffered: No - Direct: Yes - Block Size: 4KB - Disk Target: /: Average: 14.51 MB/s Average: 3622 IOPS Even if the write results are CPU bonded, as the machines containing the OSDs don’t have enough CPU to handle all the IOPS (CPU upgrades are on its way) I cannot really understand why I’m seeing so much difference in the read tests. Some configuration background: - Cluster and clients are using Hammer 0.94.90 - It’s a full SSD cluster running over Samsung Enterprise SATA SSDs, with all the typical tweaks (Customized ceph.conf, optimized sysctl, etc…) - Tried QEMU 2.0 and 2.7 – Similar results - Tried virtio-blk and virtio-scsi – Similar results I’ve been reading about POSIX AIO and Libaio, and I can see there are several differences on how they work (Like one being user space and the other one being kernel) but I don’t really get why Ceph have such problems handling POSIX AIO read operations, but not write operation, and how to avoid them. Right now I’m trying to identify if it’s something wrong with our Ceph cluster setup, with Ceph in general or with QEMU (virtio-scsi or virtio-blk as both have the same behavior) If you would like to try to reproduce the issue here are the two command lines I’m using: fio --name=randread-posix --output ./test --runtime 60 --ioengine=posixaio --buffered=0 --direct=1 --rw=randread --bs=4k --size=1024m --iodepth=32 fio --name=randread-libaio --output ./test --runtime 60 --ioengine=libaio --buffered=0 --direct=1 --rw=randread --bs=4k --size=1024m --iodepth=32 If you could shed any light over this I would be really helpful, as right now, although I have still some ideas left to try, I’m don’t have much idea about why is this happening… Thanks! Xavier _______________________________________________ ceph-users mailing list [ mailto:ceph-users@xxxxxxxxxxxxxx | ceph-users@xxxxxxxxxxxxxx ] [ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ] BQ_END _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com