Re: Ceph all NVME Cluster sequential read speed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of wido@xxxxxxxx
> Sent: 18 August 2016 09:35
> To: nick <nick@xxxxxxx>
> Cc: ceph-users <ceph-users@xxxxxxxxxxxxxx>
> Subject: Re:  Ceph all NVME Cluster sequential read speed
> 
> 
> 
> > Op 18 aug. 2016 om 10:15 heeft nick <nick@xxxxxxx> het volgende geschreven:
> >
> > Hi,
> > we are currently building a new ceph cluster with only NVME devices.
> > One Node consists of 4x Intel P3600 2TB devices. Journal and filestore
> > are on the same device. Each server has a 10 core CPU and uses 10 GBit
> > ethernet NICs for public and ceph storage traffic. We are currently testing with 4 nodes overall.
> >
> > The cluster will be used only for virtual machine images via RBD. The
> > pools are replicated (no EC).
> >
> > Altough we are pretty happy with the single threaded write
> > performance, the single threaded (iodepth=1) sequential read
> > performance is a bit disappointing.
> >
> > We are testing with fio and the rbd engine. After creating a 10GB RBD
> > image, we use the following fio params to test:
> > """
> > [global]
> > invalidate=1
> > ioengine=rbd
> > iodepth=1
> > ramp_time=2
> > size=2G
> > bs=4k
> > direct=1
> > buffered=0
> > """
> >
> > For a 4k workload we are reaching 1382 IOPS. Testing one NVME device
> > directly (with psync engine and iodepth of 1) we can reach up to 84176
> > IOPS. This is a big difference.
> >
> 
> Network is a big difference as well. Keep in mind the Ceph OSDs have to process the I/O as well.
> 
> For example, if you have a network latency of 0.200ms, in 1.000ms (1 sec) you will be able to potentially do 5.000 IOps, but that
is
> without the OSD or any other layers doing any work.
> 
> 
> > I already read that the read_ahead setting might improve the
> > situation, although this would only be true when using buffered reads, right?
> >
> > Does anyone have other suggestions to get better serial read performance?
> >
> 
> You might want to disable all logging and look at AsyncMessenger. Disabling cephx might help, but that is not very safe to do.

Just to add what Wido has mentioned. The problem is latency serialisation, the effect of the network, ceph code means that each IO
request has to travel further than if you are comparing to a local SATA cable.

The trick is to try and remove as much of this as possible where you can. Wido has mentioned 1 good option of turning off logging.
One thing I have found which helps massively is to force the CPU c-state to 1 and pin the CPU's at their max frequency. Otherwise
the CPU's can spend up to 200us waking up from deep sleep several times every IO. Doing this I managed to get my 4kb write latency
for a 3x replica pool down to 600us!!

So stick this on your kernel boot line 

intel_idle.max_cstate=1

and stick this somewhere like your rc.local

echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct

Although there maybe some gains to setting it to 90-95%, so that when only 1 core is active it can turbo slightly higher.

Also since you are using the RBD engine in fio you should be able to use readahead caching with directio. You just need to enable it
in your ceph.conf on the client machine where you are running fio.

Nick

> 
> Wido
> 
> > Cheers
> > Nick
> >
> > --
> > Sebastian Nickel
> > Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich Tel
> > +41 44 637 40 00 | Support +41 44 637 40 40 | www.nine.ch
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux