librbd doesn't know that you are using libaio vs POSIX AIO. Therefore, the best bet is that the issue is in fio or glibc. As a first step, I would recommend using blktrace (or similar) within your VM to determine if there is a delta between libaio and POSIX AIO at the block level. On Fri, Mar 10, 2017 at 12:28 PM, Xavier Trilla <xavier.trilla@xxxxxxxxxxxxxxxx> wrote: > I disabled rbd cache but no improvement, just a huge performance drop in > writes (Which proves the cache was properly disabled). > > > > Now I’m working on two other fronts: > > > > - Using librbd with jemalloc in the Hypervisors (Hammer .10) > > - Compiling QEMU with jemalloc (QEMU 2.6) > > - Running some tests from a Bare Metal server using FIO tool, but it > will use the librbd directly so no way to simulate POSIX AIO (Maybe I’ll try > via KRBD) > > > > I’m quite sure is something on the client side, but I don’t know enough > about the Ceph internals to totally discard the issue being related to OSDs. > But so far performance of the OSDs is really good using other test engines, > so I’m working more on the client side. > > > > Any help or information would be really welcome J > > > > Thanks. > > Xavier. > > > > De: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] En nombre de > Xavier Trilla > Enviado el: viernes, 10 de marzo de 2017 14:13 > Para: Alexandre DERUMIER <aderumier@xxxxxxxxx> > CC: ceph-users <ceph-users@xxxxxxxxxxxxxx> > Asunto: Re: Posix AIO vs libaio read performance > > > > Hi Alexandre, > > > > Debugging is disabled in client and osds. > > > > Regarding rbd cache, is something I will try -today I was thinking about it- > but I did not try it yet because I don't want to reduce write speed. > > > > I also tried iothreads, but no benefit. > > > > I tried as well with virtio-blk and virtio-scsi, there is a small > improvement with virtio-blk, but it's around a 10%. > > > > This is becoming a quite strange issue, as it only affects posix aio read > performance. Nothing less seems to be affected -although posix aio write > isn't nowhere near libaio performance-. > > > > Thanks for you help, if you have any other ideas they will be really > appreciated. > > > > Also if somebody could run in their cluster from inside a VM the following > command: > > > > fio --name=randread-posix --output ./test --runtime 60 --ioengine=posixaio > --buffered=0 --direct=1 --rw=randread --bs=4k --size=1024m --iodepth=32 > > > > It would be really helpful to know if I'm the only one affected or this is > happening in all qemu + ceph setups. > > Thanks! > > Xavier > > > El 10 mar 2017, a las 8:07, Alexandre DERUMIER <aderumier@xxxxxxxxx> > escribió: > > > > But it still looks like there is some bottleneck in QEMU o Librbd I cannot > manage to find. > > > you can improve latency on client with disable debug. > > on your client, create a /etc/ceph/ceph.conf with > > [global] > debug asok = 0/0 > debug auth = 0/0 > debug buffer = 0/0 > debug client = 0/0 > debug context = 0/0 > debug crush = 0/0 > debug filer = 0/0 > debug filestore = 0/0 > debug finisher = 0/0 > debug heartbeatmap = 0/0 > debug journal = 0/0 > debug journaler = 0/0 > debug lockdep = 0/0 > debug mds = 0/0 > debug mds balancer = 0/0 > debug mds locker = 0/0 > debug mds log = 0/0 > debug mds log expire = 0/0 > debug mds migrator = 0/0 > debug mon = 0/0 > debug monc = 0/0 > debug ms = 0/0 > debug objclass = 0/0 > debug objectcacher = 0/0 > debug objecter = 0/0 > debug optracker = 0/0 > debug osd = 0/0 > debug paxos = 0/0 > debug perfcounter = 0/0 > debug rados = 0/0 > debug rbd = 0/0 > debug rgw = 0/0 > debug throttle = 0/0 > debug timer = 0/0 > debug tp = 0/0 > > > you can also disable rbd_cache=false or in qemu set cache=none. > > Using iothread on qemu drive should help a little bit too. > > ----- Mail original ----- > De: "Xavier Trilla" <xavier.trilla@xxxxxxxxxxxxxxxx> > À: "ceph-users" <ceph-users@xxxxxxxxxxxxxx> > Envoyé: Vendredi 10 Mars 2017 05:37:01 > Objet: Re: Posix AIO vs libaio read performance > > > > Hi, > > > > We compiled Hammer .10 to use jemalloc and now the cluster performance > improved a lot, but POSIX AIO operations are still quite slower than libaio. > > > > Now with a single thread read operations are about 1000 per second and write > operations about 5000 per second. > > > > Using same FIO configuration, but libaio read operations are about 15K per > second and writes 12K per second. > > > > I’m compiling QEMU with jemalloc support as well, and I’m planning to > replace librbd in QEMU hosts to the new one using jemalloc. > > > > But it still looks like there is some bottleneck in QEMU o Librbd I cannot > manage to find. > > > > Any help will be much appreciated. > > > > Thanks. > > > > > > > De: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] En nombre de > Xavier Trilla > Enviado el: jueves, 9 de marzo de 2017 6:56 > Para: ceph-users@xxxxxxxxxxxxxx > Asunto: Posix AIO vs libaio read performance > > > > > Hi, > > > > I’m trying to debut why there is a big difference using POSIX AIO and libaio > when performing read tests from inside a VM using librbd. > > > > The results I’m getting using FIO are: > > > > POSIX AIO Read: > > > > Type: Random Read - IO Engine: POSIX AIO - Buffered: No - Direct: Yes - > Block Size: 4KB - Disk Target: /: > > > > Average: 2.54 MB/s > > Average: 632 IOPS > > > > Libaio Read: > > > > Type: Random Read - IO Engine: Libaio - Buffered: No - Direct: Yes - Block > Size: 4KB - Disk Target: /: > > > > Average: 147.88 MB/s > > Average: 36967 IOPS > > > > When performing writes the differences aren’t so big, because the cluster > –which is in production right now- is CPU bonded: > > > > POSIX AIO Write: > > > > Type: Random Write - IO Engine: POSIX AIO - Buffered: No - Direct: Yes - > Block Size: 4KB - Disk Target: /: > > > > Average: 14.87 MB/s > > Average: 3713 IOPS > > > > Libaio Write: > > > > Type: Random Write - IO Engine: Libaio - Buffered: No - Direct: Yes - Block > Size: 4KB - Disk Target: /: > > > > Average: 14.51 MB/s > > Average: 3622 IOPS > > > > > > Even if the write results are CPU bonded, as the machines containing the > OSDs don’t have enough CPU to handle all the IOPS (CPU upgrades are on its > way) I cannot really understand why I’m seeing so much difference in the > read tests. > > > > Some configuration background: > > > > - Cluster and clients are using Hammer 0.94.90 > > - It’s a full SSD cluster running over Samsung Enterprise SATA SSDs, with > all the typical tweaks (Customized ceph.conf, optimized sysctl, etc…) > > - Tried QEMU 2.0 and 2.7 – Similar results > > - Tried virtio-blk and virtio-scsi – Similar results > > > > I’ve been reading about POSIX AIO and Libaio, and I can see there are > several differences on how they work (Like one being user space and the > other one being kernel) but I don’t really get why Ceph have such problems > handling POSIX AIO read operations, but not write operation, and how to > avoid them. > > > > Right now I’m trying to identify if it’s something wrong with our Ceph > cluster setup, with Ceph in general or with QEMU (virtio-scsi or virtio-blk > as both have the same behavior) > > > > If you would like to try to reproduce the issue here are the two command > lines I’m using: > > > > fio --name=randread-posix --output ./test --runtime 60 --ioengine=posixaio > --buffered=0 --direct=1 --rw=randread --bs=4k --size=1024m --iodepth=32 > > fio --name=randread-libaio --output ./test --runtime 60 --ioengine=libaio > --buffered=0 --direct=1 --rw=randread --bs=4k --size=1024m --iodepth=32 > > > > > > If you could shed any light over this I would be really helpful, as right > now, although I have still some ideas left to try, I’m don’t have much idea > about why is this happening… > > > > Thanks! > > Xavier > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com