Re: cephfs, low performances

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 20 December 2015 at 19:23, Francois Lafont <flafdivers@xxxxxxx> wrote:
On 20/12/2015 22:51, Don Waterloo wrote:

> All nodes have 10Gbps to each other

Even the link client node <---> cluster nodes?

> OSD:
> $ ceph osd tree
> ID WEIGHT  TYPE NAME        UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 5.48996 root default
> -2 0.89999     host nubo-1
>  0 0.89999         osd.0         up  1.00000          1.00000
> -3 0.89999     host nubo-2
>  1 0.89999         osd.1         up  1.00000          1.00000
> -4 0.89999     host nubo-3
>  2 0.89999         osd.2         up  1.00000          1.00000
> -5 0.92999     host nubo-19
>  3 0.92999         osd.3         up  1.00000          1.00000
> -6 0.92999     host nubo-20
>  4 0.92999         osd.4         up  1.00000          1.00000
> -7 0.92999     host nubo-21
>  5 0.92999         osd.5         up  1.00000          1.00000
>
> Each contains 1 x Samsung 850 Pro 1TB SSD (on sata)
>
> Each are Ubuntu 15.10 running 4.3.0-040300-generic kernel.
> Each are running ceph 0.94.5-0ubuntu0.15.10.1
>
> nubo-1/nubo-2/nubo-3 are 2x X5650 @ 2.67GHz w/ 96GB ram.
> nubo-19/nubo-20/nubo-21 are 2x E5-2699 v3 @ 2.30GHz, w/ 576GB ram.
>
> the connections are to the chipset sata in each case.
> The fio test to the underlying xfs disk
> (e.g. cd /var/lib/ceph/osd/ceph-1; fio --randrepeat=1 --ioengine=libaio
> --direct=1 --gtod_reduce=1 --name=readwrite --filename=rw.data --bs=4k
> --iodepth=64 --size=5000MB --readwrite=randrw --rwmixread=50)
> shows ~22K IOPS on each disk.
>
> nubo-1/2/3 are also the mon and the mds:
> $ ceph status
>     cluster b23abffc-71c4-4464-9449-3f2c9fbe1ded
>      health HEALTH_OK
>      monmap e1: 3 mons at {nubo-1=
> 10.100.10.60:6789/0,nubo-2=10.100.10.61:6789/0,nubo-3=10.100.10.62:6789/0}
>             election epoch 1104, quorum 0,1,2 nubo-1,nubo-2,nubo-3
>      mdsmap e621: 1/1/1 up {0=nubo-3=up:active}, 2 up:standby
>      osdmap e2459: 6 osds: 6 up, 6 in
>       pgmap v127331: 840 pgs, 6 pools, 144 GB data, 107 kobjects
>             289 GB used, 5332 GB / 5622 GB avail
>                  840 active+clean
>   client io 0 B/s rd, 183 kB/s wr, 54 op/s

And you have "replica size == 3" in your cluster, correct?
Do you have specific mount options or specific options in ceph.conf concerning ceph-fuse?

So the hardware configuration of your cluster seems to me globally highly
better than my cluster (config given in my first message) because you have
10Gb links (between the client and the cluster I have just 1Gb) and you
have full SSD OSDs.

I have tried to put _all_ cephfs in my SSD: ie the pools "cephfsdata" _and_
"cephfsmetadata" are in the SSD. The performances are slightly improved because
I have ~670 iops now (with the fio command of my first message again) but it
still seems to me bad.

In fact, I'm curious to have the opinion of "cephfs" experts to know what
iops we can expect. If anaything, ~700 iops is a correct iops for our hardware
configuration and maybe we are searching a problem which doesn't exist...

All nodes are interconnected on 10G (actually 8x10G, so 80Gbps, but i have 7 disabled for this test). I have done a 'iperf' w/ TCP and verified I can achieve ~9Gbps between each pair. I have jumbo frames enabled (so 9000 MTU, 8982 route mtu).

i have replica 2.

My 2 cephfs pools are:

pool 12 'cephfs_metadata' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 2239 flags hashpspool stripe_width 0
pool 13 'cephfs_data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 2243 flags hashpspool crash_replay_interval 45 stripe_width 0

w/ cephfs-fuse, i used default except added noatime.

My ceph.conf is:

[global]
fsid = XXXX
mon_initial_members = nubo-2, nubo-3, nubo-1
mon_host = 10.100.10.61,10.100.10.62,10.100.10.60
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 2
public_network = 10.100.10.0/24
osd op threads = 6
osd disk threads = 6

[mon]
    mon clock drift allowed = .600
 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux