Re: Ceph performances

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I just saw the release announce of infernalis. I will test it in the meantime.

Rémi

On 07/11/2015 09:24, Rémi BUISSON wrote:
Hi guys,

I would need your help to figure out performance issues on my ceph cluster. I've read pretty much every thread on the net concerning this topic but I didn't manage to have acceptable performances. In my company, we are planning to replace our existing virtualization infrastucture NAS by a ceph cluster in order to improve the global platform performances, scalability and security. The current NAS we have handle about 50k iops.

For this we bought:
2 x NFS servers: 2 x Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 32 GB RAM, 2 x 10Gbps network interfaces (bonding) 3 x MON servers: 1 x Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz, 16 GB RAM, 2 x 10Gbps network interfaces (bonding) 2 x MDS servers: 2 x Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz, 32 GB RAM, 2 x 10Gbps network interfaces (bonding) 2 x OSD servers (cache): 2 x Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 256 GB RAM, 2 x SSD INTEL SSDSC2BX200G4 (200 GB) for journal, 6 x SSD INTEL SSDSC2BX016T4R (1,4 TB) for data, 2 x 10Gbps network interfaces (bonding) 4 x OSD servers (storage): 2 x Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 256 GB RAM, 4 x SSD TOSHIBA PX02SMF020 (200GB) for journal, 18 x HGST Ultrastar HUC101818CS4204 (1.8TB) for data, 2 x 10Gbps network interfaces (bonding)

The total of this is 84 OSDs.

I created two 4096 pgs pools, one called rbd-cold-storage and the other rbd-hot-storage. As you may guess, the rbd-cold-storage is composed of the 4 OSD servers with platter disks and the rbd-hot-storage is composed of the 2 OSD servers with SSD disks. On the rdb-cold-storage, I created an rbd device which is mapped on the NFS server.

I benched each of the SSD we have and it can handle 40k iops each. As my replication factor is 2, the theoritical performance of the cluster is (2 x 6 (OSD cache) x 40k) / 2 = 240k iops.

I'm currently benching the cluster with fio tool from one NFS server. Here my fio job file:
[global]
ioengine=libaio
iodepth=32
runtime=300
direct=1
filename=/dev/rbd0
group_reporting=1
gtod_reduce=1
randrepeat=1
size=4G
numjobs=1

[4k-rand-write]
new_group
bs=4k
rw=randwrite
stonewall

The problem is I can't get more than 15k iops for writes. In my monitoring engine, I can see that each of the OSD (cache) SSD are not doing more than 2,5k iops which seems to correspond with 6 x 2,5k = 15k iops. I don't expect to reach the theoritical value but reaching 100k iops would be perfect.

My cluster is running on debian jessie with ceph Hammer v0.94.5 debian package (compiled with --with-jemalloc option, I also tried without). Here is my ceph.conf:


[global]
fsid = 5046f766-670f-4705-adcc-290f434c8a83

# basic settings
mon initial members = a01cepmon001,a01cepmon002,a01cepmon003
mon host = 10.10.69.254,10.10.69.253,10.10.69.252
mon osd allow primary affinity = true
# network settings
public network = 10.10.69.128/25
cluster network = 10.10.69.0/25

# auth settings
auth cluster required = cephx
auth service required = cephx
auth client required = cephx

# default pools settings
osd pool default size = 2
osd pool default min size = 1
osd pool default pg num = 8192
osd pool default pgp num = 8192
osd crush chooseleaf type = 1

# debug settings
debug lockdep = 0/0
debug context = 0/0
debug crush = 0/0
debug buffer = 0/0
debug timer = 0/0
debug journaler = 0/0
debug osd = 0/0
debug optracker = 0/0
debug objclass = 0/0
debug filestore = 0/0
debug journal = 0/0
debug ms = 0/0
debug monc = 0/0
debug tp = 0/0
debug auth = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug perfcounter = 0/0
debug asok = 0/0
debug throttle = 0/0

throttler perf counter = false
osd enable op tracker = false

## OSD settings
[osd]
# OSD FS settings
osd mkfs type = xfs
osd mkfs options xfs = -f -i size=2048
osd mount options xfs = rw,noatime,logbsize=256k,delaylog

# OSD journal settings
osd journal block align = true
osd journal aio = true
osd journal dio = true

# Performance tuning
filestore xattr use omap = true
filestore merge threshold = 40
filestore split multiple = 8
filestore max sync interval = 10
filestore queue max ops = 100000
filestore queue max bytes = 1GiB
filestore op threads = 20
filestore journal writeahead = true
filestore fd cache size = 10240
osd op threads = 8

Disabling throttling doesn't change anything.
So after all I read, I would like to know if, since the few months old threads, someone to fix those kind of problems ? any idea or thoughts to improve this ?

Thanks.

Rémi
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux