Re: Posix AIO vs libaio read performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



After some tests I just wanted to post my findings about this. Looks like for some reason POSIX AIO reads -at least using FIO- are not really asynchronous, as the results I'm getting are quite similar to using SYNC engine instead of POSIX AIO engine. 

The biggest improvement for this has been using jemalloc instead of TCmalloc. It really improved the latency -and CPU usage of OSDs- but I don't really get why POSIX AIO reads using FIO are giving so bad results, were writes using POSIX AIO are a lot faster.

But as I've said, looks like there is something wrong with FIO and POSIX AIO. I've been checking and looks like the only way I can manage to run a different test using POSIX AIO will be if I write myself -or one of our developers does it- a test using POSIX AIO (My developer days are way behind me... )

Anyway, at least this issue help us improve by a huge margin the latency and overall performance of our ceph cluster :)

Thanks for all your help!
Xavi.

-----Mensaje original-----
De: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] En nombre de Xavier Trilla
Enviado el: viernes, 10 de marzo de 2017 20:28
Para: dillaman@xxxxxxxxxx
CC: ceph-users <ceph-users@xxxxxxxxxxxxxx>
Asunto: Re:  Posix AIO vs libaio read performance

Hi Jason,


Just to add more information: 

- The issue doesn't seem to be fio or glibc (guest) related, as it is working properly on other environments using the same software versions. Also I've tried using Ubuntu 14.04 and 16.04 and I'm getting really similar results, but I'll ran more tests just to be 100% sure.
- If I increase the number of concurrent jobs in fio (F.e. 16) results are much better (They get above 10k IOPS)
- I'm seeing similar bad results when using KRBD, but I still need to run more tests on this front (I'm using KRBD from inside a VM, because in our infrastructure getting your hands on a test physical machine it's quite difficult, but I'll manage. The VM has 10G connection, and I'm mounting the RBD volume from inside the VM using the kernel module -4.4- so the result should give an idea of how KRBD will perform)
- I'm not seeing improvements with librbd compiled with jemalloc support.
- No difference between QEMU 2.0, 2.5 or 2.7

Looks like it's related with an interaction of how POSIX AIO handles the direct reads and how Ceph works -but it could also be KVM related-. I could argue it's related with being a networked storage, but for example in other environments like Amazon EBS I'm not seeing this issue, but obviously I don't have any idea about EBS internals (But I guess that's what we are trying to match... if it works properly on EBS it should work properly on Ceph too ;) Also, I'm still trying to verify if this is just related to my setup or affects all ceph installations. 

One of the things I find more strange, is the performance difference in the read department. Libaio performance is way better in both read and write, but the biggest difference is between posix aio read and librbd read.

BTW: Do you have a test environment were you could test fio using posix aio? I've been running tests in our production and test cluster, but they run almost the same version (hammer) of everything :/ Maybe I'll try to deploy a new cluster using jewel -if I can get my hands on enough hardware-. Here are the command lines for FIO:

POSIX AIO:
fio --name=randread-posix --runtime 60 --ioengine=posixaio --buffered=0 --direct=1 --rw=randread --bs=4k --size=1024m --iodepth=32

Libaio:

fio --name=randread-libaio --runtime 60 --ioengine=libaio --buffered=0 --direct=1 --rw=randread --bs=4k --size=1024m --iodepth=32

Also thanks for the blktrace tip, on Monday I'll start playing with it and I'll post my findings.

Thanks!
Xavier

-----Mensaje original-----
De: Jason Dillaman [mailto:jdillama@xxxxxxxxxx] Enviado el: viernes, 10 de marzo de 2017 19:18
Para: Xavier Trilla <xavier.trilla@xxxxxxxxxxxxxxxx>
CC: Alexandre DERUMIER <aderumier@xxxxxxxxx>; ceph-users <ceph-users@xxxxxxxxxxxxxx>
Asunto: Re:  Posix AIO vs libaio read performance

librbd doesn't know that you are using libaio vs POSIX AIO. Therefore, the best bet is that the issue is in fio or glibc. As a first step, I would recommend using blktrace (or similar) within your VM to determine if there is a delta between libaio and POSIX AIO at the block level.

On Fri, Mar 10, 2017 at 12:28 PM, Xavier Trilla <xavier.trilla@xxxxxxxxxxxxxxxx> wrote:
> I disabled rbd cache but no improvement, just a huge performance drop 
> in writes (Which proves the cache was properly disabled).
>
>
>
> Now I’m working on two other fronts:
>
>
>
> -        Using librbd with jemalloc in the Hypervisors (Hammer .10)
>
> -        Compiling QEMU with jemalloc (QEMU 2.6)
>
> -        Running some tests from a Bare Metal server using FIO tool, but it
> will use the librbd directly so no way to simulate POSIX AIO (Maybe 
> I’ll try via KRBD)
>
>
>
> I’m quite sure is something on the client side, but I don’t know 
> enough about the Ceph internals to totally discard the issue being related to OSDs.
> But so far performance of the OSDs is really good using other test 
> engines, so I’m working more on the client side.
>
>
>
> Any help or information would be really welcome J
>
>
>
> Thanks.
>
> Xavier.
>
>
>
> De: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] En nombre de 
> Xavier Trilla Enviado el: viernes, 10 de marzo de 2017 14:13
> Para: Alexandre DERUMIER <aderumier@xxxxxxxxx>
> CC: ceph-users <ceph-users@xxxxxxxxxxxxxx>
> Asunto: Re:  Posix AIO vs libaio read performance
>
>
>
> Hi Alexandre,
>
>
>
> Debugging is disabled in client and osds.
>
>
>
> Regarding rbd cache, is something I will try -today I was thinking 
> about it- but I did not try it yet because I don't want to reduce write speed.
>
>
>
> I also tried iothreads, but no benefit.
>
>
>
> I tried as well with virtio-blk and virtio-scsi, there is a small 
> improvement with virtio-blk, but it's around a 10%.
>
>
>
> This is becoming a quite strange issue, as it only affects posix aio 
> read performance. Nothing less seems to be affected -although posix 
> aio write isn't nowhere near libaio performance-.
>
>
>
> Thanks for you help, if you have any other ideas they will be really 
> appreciated.
>
>
>
> Also if somebody could run in their cluster from inside a VM the 
> following
> command:
>
>
>
> fio --name=randread-posix --output ./test --runtime 60 
> --ioengine=posixaio
> --buffered=0 --direct=1 --rw=randread --bs=4k --size=1024m
> --iodepth=32
>
>
>
> It would be really helpful to know if I'm the only one affected or 
> this is happening in all qemu + ceph setups.
>
> Thanks!
>
> Xavier
>
>
> El 10 mar 2017, a las 8:07, Alexandre DERUMIER <aderumier@xxxxxxxxx>
> escribió:
>
>
>
> But it still looks like there is some bottleneck in QEMU o Librbd I 
> cannot manage to find.
>
>
> you can improve latency on client with disable debug.
>
> on your client, create a /etc/ceph/ceph.conf with
>
> [global]
> debug asok = 0/0
> debug auth = 0/0
> debug buffer = 0/0
> debug client = 0/0
> debug context = 0/0
> debug crush = 0/0
> debug filer = 0/0
> debug filestore = 0/0
> debug finisher = 0/0
> debug heartbeatmap = 0/0
> debug journal = 0/0
> debug journaler = 0/0
> debug lockdep = 0/0
> debug mds = 0/0
> debug mds balancer = 0/0
> debug mds locker = 0/0
> debug mds log = 0/0
> debug mds log expire = 0/0
> debug mds migrator = 0/0
> debug mon = 0/0
> debug monc = 0/0
> debug ms = 0/0
> debug objclass = 0/0
> debug objectcacher = 0/0
> debug objecter = 0/0
> debug optracker = 0/0
> debug osd = 0/0
> debug paxos = 0/0
> debug perfcounter = 0/0
> debug rados = 0/0
> debug rbd = 0/0
> debug rgw = 0/0
> debug throttle = 0/0
> debug timer = 0/0
> debug tp = 0/0
>
>
> you can also disable rbd_cache=false   or in qemu set cache=none.
>
> Using iothread on qemu drive should help a little bit too.
>
> ----- Mail original -----
> De: "Xavier Trilla" <xavier.trilla@xxxxxxxxxxxxxxxx>
> À: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
> Envoyé: Vendredi 10 Mars 2017 05:37:01
> Objet: Re:  Posix AIO vs libaio read performance
>
>
>
> Hi,
>
>
>
> We compiled Hammer .10 to use jemalloc and now the cluster performance 
> improved a lot, but POSIX AIO operations are still quite slower than libaio.
>
>
>
> Now with a single thread read operations are about 1000 per second and 
> write operations about 5000 per second.
>
>
>
> Using same FIO configuration, but libaio read operations are about 15K 
> per second and writes 12K per second.
>
>
>
> I’m compiling QEMU with jemalloc support as well, and I’m planning to 
> replace librbd in QEMU hosts to the new one using jemalloc.
>
>
>
> But it still looks like there is some bottleneck in QEMU o Librbd I 
> cannot manage to find.
>
>
>
> Any help will be much appreciated.
>
>
>
> Thanks.
>
>
>
>
>
>
> De: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] En nombre de 
> Xavier Trilla Enviado el: jueves, 9 de marzo de 2017 6:56
> Para: ceph-users@xxxxxxxxxxxxxx
> Asunto:  Posix AIO vs libaio read performance
>
>
>
>
> Hi,
>
>
>
> I’m trying to debut why there is a big difference using POSIX AIO and 
> libaio when performing read tests from inside a VM using librbd.
>
>
>
> The results I’m getting using FIO are:
>
>
>
> POSIX AIO Read:
>
>
>
> Type: Random Read - IO Engine: POSIX AIO - Buffered: No - Direct: Yes
> - Block Size: 4KB - Disk Target: /:
>
>
>
> Average: 2.54 MB/s
>
> Average: 632 IOPS
>
>
>
> Libaio Read:
>
>
>
> Type: Random Read - IO Engine: Libaio - Buffered: No - Direct: Yes - 
> Block
> Size: 4KB - Disk Target: /:
>
>
>
> Average: 147.88 MB/s
>
> Average: 36967 IOPS
>
>
>
> When performing writes the differences aren’t so big, because the 
> cluster –which is in production right now- is CPU bonded:
>
>
>
> POSIX AIO Write:
>
>
>
> Type: Random Write - IO Engine: POSIX AIO - Buffered: No - Direct: Yes
> - Block Size: 4KB - Disk Target: /:
>
>
>
> Average: 14.87 MB/s
>
> Average: 3713 IOPS
>
>
>
> Libaio Write:
>
>
>
> Type: Random Write - IO Engine: Libaio - Buffered: No - Direct: Yes - 
> Block
> Size: 4KB - Disk Target: /:
>
>
>
> Average: 14.51 MB/s
>
> Average: 3622 IOPS
>
>
>
>
>
> Even if the write results are CPU bonded, as the machines containing 
> the OSDs don’t have enough CPU to handle all the IOPS (CPU upgrades 
> are on its
> way) I cannot really understand why I’m seeing so much difference in 
> the read tests.
>
>
>
> Some configuration background:
>
>
>
> - Cluster and clients are using Hammer 0.94.90
>
> - It’s a full SSD cluster running over Samsung Enterprise SATA SSDs, 
> with all the typical tweaks (Customized ceph.conf, optimized sysctl,
> etc…)
>
> - Tried QEMU 2.0 and 2.7 – Similar results
>
> - Tried virtio-blk and virtio-scsi – Similar results
>
>
>
> I’ve been reading about POSIX AIO and Libaio, and I can see there are 
> several differences on how they work (Like one being user space and 
> the other one being kernel) but I don’t really get why Ceph have such 
> problems handling POSIX AIO read operations, but not write operation, 
> and how to avoid them.
>
>
>
> Right now I’m trying to identify if it’s something wrong with our Ceph 
> cluster setup, with Ceph in general or with QEMU (virtio-scsi or 
> virtio-blk as both have the same behavior)
>
>
>
> If you would like to try to reproduce the issue here are the two 
> command lines I’m using:
>
>
>
> fio --name=randread-posix --output ./test --runtime 60 
> --ioengine=posixaio
> --buffered=0 --direct=1 --rw=randread --bs=4k --size=1024m
> --iodepth=32
>
> fio --name=randread-libaio --output ./test --runtime 60 
> --ioengine=libaio
> --buffered=0 --direct=1 --rw=randread --bs=4k --size=1024m
> --iodepth=32
>
>
>
>
>
> If you could shed any light over this I would be really helpful, as 
> right now, although I have still some ideas left to try, I’m don’t 
> have much idea about why is this happening…
>
>
>
> Thanks!
>
> Xavier
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



--
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux