Re: cephfs, low performances

Francois Lafont <flafdivers@xxxxxxx> · Mon, 21 Dec 2015 01:23:35 +0100

On 20/12/2015 22:51, Don Waterloo wrote:

> All nodes have 10Gbps to each other

Even the link client node <---> cluster nodes?

> OSD:
> $ ceph osd tree
> ID WEIGHT  TYPE NAME        UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 5.48996 root default
> -2 0.89999     host nubo-1
>  0 0.89999         osd.0         up  1.00000          1.00000
> -3 0.89999     host nubo-2
>  1 0.89999         osd.1         up  1.00000          1.00000
> -4 0.89999     host nubo-3
>  2 0.89999         osd.2         up  1.00000          1.00000
> -5 0.92999     host nubo-19
>  3 0.92999         osd.3         up  1.00000          1.00000
> -6 0.92999     host nubo-20
>  4 0.92999         osd.4         up  1.00000          1.00000
> -7 0.92999     host nubo-21
>  5 0.92999         osd.5         up  1.00000          1.00000
> 
> Each contains 1 x Samsung 850 Pro 1TB SSD (on sata)
> 
> Each are Ubuntu 15.10 running 4.3.0-040300-generic kernel.
> Each are running ceph 0.94.5-0ubuntu0.15.10.1
> 
> nubo-1/nubo-2/nubo-3 are 2x X5650 @ 2.67GHz w/ 96GB ram.
> nubo-19/nubo-20/nubo-21 are 2x E5-2699 v3 @ 2.30GHz, w/ 576GB ram.
> 
> the connections are to the chipset sata in each case.
> The fio test to the underlying xfs disk
> (e.g. cd /var/lib/ceph/osd/ceph-1; fio --randrepeat=1 --ioengine=libaio
> --direct=1 --gtod_reduce=1 --name=readwrite --filename=rw.data --bs=4k
> --iodepth=64 --size=5000MB --readwrite=randrw --rwmixread=50)
> shows ~22K IOPS on each disk.
> 
> nubo-1/2/3 are also the mon and the mds:
> $ ceph status
>     cluster b23abffc-71c4-4464-9449-3f2c9fbe1ded
>      health HEALTH_OK
>      monmap e1: 3 mons at {nubo-1=
> 10.100.10.60:6789/0,nubo-2=10.100.10.61:6789/0,nubo-3=10.100.10.62:6789/0}
>             election epoch 1104, quorum 0,1,2 nubo-1,nubo-2,nubo-3
>      mdsmap e621: 1/1/1 up {0=nubo-3=up:active}, 2 up:standby
>      osdmap e2459: 6 osds: 6 up, 6 in
>       pgmap v127331: 840 pgs, 6 pools, 144 GB data, 107 kobjects
>             289 GB used, 5332 GB / 5622 GB avail
>                  840 active+clean
>   client io 0 B/s rd, 183 kB/s wr, 54 op/s

And you have "replica size == 3" in your cluster, correct?
Do you have specific mount options or specific options in ceph.conf concerning ceph-fuse?

So the hardware configuration of your cluster seems to me globally highly
better than my cluster (config given in my first message) because you have
10Gb links (between the client and the cluster I have just 1Gb) and you
have full SSD OSDs.

I have tried to put _all_ cephfs in my SSD: ie the pools "cephfsdata" _and_ 
"cephfsmetadata" are in the SSD. The performances are slightly improved because
I have ~670 iops now (with the fio command of my first message again) but it
still seems to me bad.

In fact, I'm curious to have the opinion of "cephfs" experts to know what
iops we can expect. If anaything, ~700 iops is a correct iops for our hardware
configuration and maybe we are searching a problem which doesn't exist...

-- 
François Lafont
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com