Re: cephfs, low performances

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Thu, 31 Dec 2015 07:30:07 -0700

Because Ceph is not perfectly distributed there will be more PGs/objects in one drive than others. That drive will become a bottleneck for the entire cluster. The current IO scheduler poses some challenges in this regard. I've implemented a new scheduler which I've seen much better drive utilization across the cluster as well as 3-17% performance increase and a substantial reduction in client performance deviation (all clients are getting the same amount of performance). Hopefully we will be able to get that into Jewel.
Robert LeBlanc
Sent from a mobile device please excuse any typos.
On Dec 31, 2015 12:20 AM, "Francois Lafont" <flafdivers@xxxxxxx> wrote:
Hi,

On 30/12/2015 10:23, Yan, Zheng wrote:

>> And it seems to me that I can see the bottleneck of my little cluster (only

>> 5 OSD servers with each 4 osds daemons). According to the "atop" command, I

>> can see that some disks (4TB SATA 7200rpm Western digital WD4000FYYZ) are

>> very busy. It's curious because during the bench I have some disks very busy

>> and some other disks not so busy. But I think the reason is that is a little

>> cluster and with just 15 osds (the 5 other osds are full SSD osds cephfsmetadata

>> dedicated), I can have a perfect repartition of data, especially when the

>> bench concern just a specific file of few hundred MB.

>

> do these disks have same size and performance? large disks (with

> higher wights) or slow disks are likely busy.

The disks are exactly the same model with the same size (4TB SATA 7200rpm

Western digital WD4000FYYZ). I'm not completely sure but it seems to me

that in a specific node I have a disk which is a little slower than the

others (maybe minus ~50-75 iops) and it seems to me that it's the busiest

disk during a bench.

Is it possible (or frequent) to have difference of perfs between exactly

same model of disks?

>> That being said, when you talk about "using buffered IO" I'm not sure to

>> understand the option of fio which is concerns by that. Is it the --buffered

>> option ? Because with this option I have noticed no change concerning iops.

>> Personally, I was able to increase global iops only with the --numjobs option.

>>

>

> I didn't make it clear. I actually meant buffered write (add

> --rwmixread=0 option to fio) .

But with fio if I set "--readwrite=randrw --rwmixread=0", it's completely

equivalent to just set "--readwrite=randwrite", no?

> In your test case, writes mix with reads.

Yes indeed.

> read is synchronous when cache miss.

You mean that I have SYNC IO for reading if I set --direct=0, is it correct?

Is it valid for any file system or just for cephfs?

Regards.

--

François Lafont

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com