Re: cephfs, low performances

"Yan, Zheng" <ukernel@xxxxxxxxx> · Wed, 30 Dec 2015 17:23:19 +0800

On Tue, Dec 29, 2015 at 5:20 PM, Francois Lafont <flafdivers@xxxxxxx> wrote:
> Hi,
>
> On 28/12/2015 09:04, Yan, Zheng wrote:
>
>>> Ok, so in a client node, I have mounted cephfs (via ceph-fuse) and a rados
>>> block device formatted in XFS. If I have well understood, cephfs uses sync
>>> IO (not async IO) and, with ceph-fuse, cephfs can't make O_DIRECT IO. So, I
>>> have tested this fio command in cephfs _and_ in rbd:
>>>
>>>     fio --randrepeat=1 --ioengine=sync --direct=0 --gtod_reduce=1 --name=readwrite \
>>>         --filename=rw.data --bs=4k --iodepth=1 --size=300MB --readwrite=randrw     \
>>>
>>>
>>> and indeed with cephfs _and_ rbd, I have approximatively the same result:
>>> - cephfs => ~516 iops
>>> - rbd    => ~587 iops
>>>
>>> Is it consistent?
>>>
>> yes
>
> Ok, cool. ;)
>
>>> That being said, I'm unable to know if it's good performance as regard my hardware
>>> configuration. I'm curious to know the result in other clusters with the same fio
>>> command.
>>
>> This fio command is check performance of single thread SYNC IO. If you
>> want to check overall throughput, you can try using buffered IO or
>> increasing thread number.
>
> Ok, I have increased the thread number via the --numjobs option of fio
> and indeed, if I add all the iops of each job, it seems to me that I can
> reach something like ~1000 iops with ~5 jobs. This result seems to me
> further in relation with my hardware configuration, isn't it?

yes

>
> And it seems to me that I can see the bottleneck of my little cluster (only
> 5 OSD servers with each 4 osds daemons). According to the "atop" command, I
> can see that some disks (4TB SATA 7200rpm Western digital WD4000FYYZ) are
> very busy. It's curious because during the bench I have some disks very busy
> and some other disks not so busy. But I think the reason is that is a little
> cluster and with just 15 osds (the 5 other osds are full SSD osds cephfsmetadata
> dedicated), I can have a perfect repartition of data, especially when the
> bench concern just a specific file of few hundred MB.

do these disks have same size and performance? large disks (with
higher wights) or slow disks are likely busy.

>
> That being said, when you talk about "using buffered IO" I'm not sure to
> understand the option of fio which is concerns by that. Is it the --buffered
> option ? Because with this option I have noticed no change concerning iops.
> Personally, I was able to increase global iops only with the --numjobs option.
>

I didn't make it clear. I actually meant buffered write (add
--rwmixread=0 option to fio) . In your test case, writes mix with
reads. read is synchronous when cache miss.

Regards
Yan, Zheng

>> FYI, I have written a patch to add AIO support to cephfs kernel client:
>> https://github.com/ceph/ceph-client/commits/testing
>
> Ok thanks for the information but I'm afraid to be unable to test it immediately.
>
>>> * --direct=1 => ~1400 iops
>>> * --direct=0 => ~570 iops
>>>
>>> Why I have this behavior? I thought it will be the opposite (better perfs with
>>> --direct=0). Is it normal?
>>>
>> linux kernel only supports AIO for fd opened in O_DIRECT mode, when
>> file is not opened in O_DIRECT mode, AIO is actually SYNC IO.
>
> Ok, so this is not ceph specific, this is a behavior of the Linux kernel.
> A good info to know again.
>
> Anyway, thanks _a_ _lot_ Yan for your help very efficient. I have learned
> lot of very interesting things
>
> Regards.
>
> --
> François Lafont
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com