Re: cephfs, low performances

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Fri, 18 Dec 2015 03:36:12 +0100 Francois Lafont wrote:

> Hi,
> 
> I have ceph cluster currently unused and I have (to my mind) very low
> performances. I'm not an expert in benchs, here an example of quick
> bench:
> 
> ---------------------------------------------------------------
> # fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
> --name=readwrite --filename=rw.data --bs=4k --iodepth=64 --size=300MB
> --readwrite=randrw --rwmixread=50 readwrite: (g=0): rw=randrw,
> bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64 fio-2.1.3 Starting 1
> process readwrite: Laying out IO file(s) (1 file(s) / 300MB)
> Jobs: 1 (f=1): [m] [100.0% done] [2264KB/2128KB/0KB /s] [566/532/0 iops]
> [eta 00m:00s] readwrite: (groupid=0, jobs=1): err= 0: pid=3783: Fri Dec
> 18 02:01:13 2015 read : io=153640KB, bw=2302.9KB/s, iops=575, runt=
> 66719msec write: io=153560KB, bw=2301.7KB/s, iops=575, runt= 66719msec
>   cpu          : usr=0.77%, sys=3.07%, ctx=115432, majf=0, minf=604
>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
> >=64=99.9% submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> >64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%,
> >=64=0.0% issued    : total=r=38410/w=38390/d=0, short=r=0/w=0/d=0
> 
> Run status group 0 (all jobs):
>    READ: io=153640KB, aggrb=2302KB/s, minb=2302KB/s, maxb=2302KB/s,
> mint=66719msec, maxt=66719msec WRITE: io=153560KB, aggrb=2301KB/s,
> minb=2301KB/s, maxb=2301KB/s, mint=66719msec, maxt=66719msec
> ---------------------------------------------------------------
> 
> It seems to me very bad. 
Indeed. 
Firstly let me state that I don't use CephFS and have no clues how this
influences things and can/should be tuned.

That being said, the fio above running in VM (RBD) gives me 440 IOPS
against a single OSD storage server (replica 1) with 4 crappy HDDs and
on-disk journals on my test cluster (1Gb/s links). 
So yeah, given your configuration that's bad.

In comparison I get 3000 IOPS against a production cluster (so not idle)
with 4 storage nodes. Each with 4 100GB DC S3700 for journals and OS and 8
SATA HDDs, Infiniband (IPoIB) connectivity for everything.

All of this is with .80.x (Firefly) on Debian Jessie.


> Can I hope better results with my setup
> (explained below)? During the bench, I don't see particular symptoms (no
> CPU blocked at 100% etc). If you have advices to improve the perf and/or
> maybe to make smarter benchs, I'm really interested.
> 
You want to use atop on all your nodes and look for everything from disks
to network utilization. 
There might be nothing obvious going on, but it needs to be ruled out.

> Thanks in advance for your help. Here is my conf...
> 
> I use Ubuntu 14.04 on each server with the 3.13 kernel (it's the same
> for the client ceph where I run my bench) and I use Ceph 9.2.0
> (Infernalis). 

I seem to recall that this particular kernel has issues, you might want to
scour the archives here.

>On the client, cephfs is mounted via cephfs-fuse with this
> in /etc/fstab:
> 
> id=cephfs,keyring=/etc/ceph/ceph.client.cephfs.keyring,client_mountpoint=/	/mnt/cephfs
> fuse.ceph	noatime,defaults,_netdev	0	0
> 
> I have 5 cluster node servers "Supermicro Motherboard X10SLM+-LN4 S1150"
> with one 1GbE port for the ceph public network and one 10GbE port for
> the ceph private network:
> 
For the sake of latency (which becomes the biggest issues when you're not
exhausting CPU/DISK), you'd be better off with everything on 10GbE, unless
you need the 1GbE to connect to clients that have no 10Gb/s ports.

> - 1 x Intel Xeon E3-1265Lv3
> - 1 SSD DC3710 Series 200GB (with partitions for the OS, the 3
> OSD-journals and, just for ceph01, ceph02 and ceph03, the SSD contains
> too a partition for the workdir of a monitor
The 200GB DC S3700 would have been faster, but that's a moot point and not
your bottleneck for sure.

> - 3 HD 4TB Western Digital (WD) SATA 7200rpm
> - RAM 32GB
> - NO RAID controlleur

Which controller are you using?
I recently came across an Adaptec SATA3 HBA that delivered only 176 MB/s
writes with 200GB DC S3700s as opposed to 280MB/s when used with Intel
onboard SATA-3 ports or a LSI 9211-4i HBA.

Regards,

Christian

> - Each partition uses XFS with noatim option, except the OS partition in
> EXT4.
> 
> Here is my ceph.conf :
> 
> ---------------------------------------------------------------
> [global]
>   fsid                           = xxxxxxxxxxxxxxxxxxxxxxxxxxxx
>   cluster network                = 192.168.22.0/24
>   public network                 = 10.0.2.0/24
>   auth cluster required          = cephx
>   auth service required          = cephx
>   auth client required           = cephx
>   filestore xattr use omap       = true
>   osd pool default size          = 3
>   osd pool default min size      = 1
>   osd pool default pg num        = 64
>   osd pool default pgp num       = 64
>   osd crush chooseleaf type      = 1
>   osd journal size               = 0
>   osd max backfills              = 1
>   osd recovery max active        = 1
>   osd client op priority         = 63
>   osd recovery op priority       = 1
>   osd op threads                 = 4
>   mds cache size                 = 1000000
>   osd scrub begin hour           = 3
>   osd scrub end hour             = 5
>   mon allow pool delete          = false
>   mon osd down out subtree limit = host
>   mon osd min down reporters     = 4
> 
> [mon.ceph01]
>   host     = ceph01
>   mon addr = 10.0.2.101
> 
> [mon.ceph02]
>   host     = ceph02
>   mon addr = 10.0.2.102
> 
> [mon.ceph03]
>   host     = ceph03
>   mon addr = 10.0.2.103
> ---------------------------------------------------------------
> 
> mds are in active/standby mode.
> 


-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux