Re: Very low 4k randread performance ~1000iops

"Tuomas Juntunen" <tuomas.juntunen@xxxxxxxxxxxxxxx> · Thu, 2 Jul 2015 05:55:56 +0300

I've now read all messages relating to EnhanceIO and what I can tell it would not help on this, atleast not the way I would want it to.

Thanks Christian for pointing this out.

Br, T

-----Original Message-----
From: Christian Balzer [mailto:chibi@xxxxxxx] 
Sent: 2. heinäkuuta 2015 4:30
To: ceph-users@xxxxxxxxxxxxxx
Cc: Mark Nelson; Tuomas Juntunen
Subject: Re:  Very low 4k randread performance ~1000iops

On Wed, 01 Jul 2015 13:50:39 -0500 Mark Nelson wrote:

> 
> 
> On 07/01/2015 01:39 PM, Tuomas Juntunen wrote:
> > Thanks Mark
> >
> > Are there any plans for ZFS like L2ARC to CEPH or is the cache 
> > tiering what should work like this in the future?
> >
> > I have tested cache tier + EC pool, and that created too much load 
> > on our servers, so it was not viable to be used.
> 
> We are doing a lot of work in this space right now.  Hopefully we'll 
> see improvements coming in the coming releases.
> 
Another, obvious and rather effective way to improve reads is of course to have your hot objects in the page cache of the storage servers.
Meaning: add as much memory there as you can afford.

> >
> > I was also wondering if EnhanceIO would be a good solution for 
> > getting more random iops. I've read some Sébastien's writings.
> 
> Possibly!  Try it and let us know. ;)
> 
As somebody who reads pretty much all of what gets posted in this ML and this having come up numerous times I'd suggest scouring the archives.

My impression is that none of these things work, otherwise my test cluster here would include it.
They range from "abandoned project", "doesn't help" to "will eat your data".

Christian
> >
> > Br,
> > Tuomas
> >
> >
> > -----Original Message-----
> > From: Mark Nelson [mailto:mnelson@xxxxxxxxxx]
> > Sent: 1. heinäkuuta 2015 20:29
> > To: Tuomas Juntunen; ceph-users@xxxxxxxxxxxxxx
> > Subject: Re:  Very low 4k randread performance ~1000iops
> >
> > On 07/01/2015 12:13 PM, Tuomas Juntunen wrote:
> >> Hi
> >>
> >> Yes, the OSD's are on spinning disks and we have 18 SSD's for 
> >> journal, one SSD for two OSD's
> >>
> >> The OSD's are:
> >> Model Family:     Seagate Barracuda 7200.14 (AF)
> >> Device Model:     ST2000DM001-1CH164
> >>
> >> What I've understood the journals are not used as read cache at 
> >> all, just for writing. Would SSD based cache pool be viable solution here?
> >
> > Ok, so that makes more sense. The performance is still lower than 
> > expected but maybe 3-4x rather than several orders of magnitude.  My 
> > guess is that cache tiering in it's current form probably won't help 
> > you much unless you have a workload that fits mostly into the cache.
> > The promotion penalty is really high though so we likely will have 
> > to promote much more slowly than we currently do.
> >
> > Mark
> >
> >>
> >> Br, T
> >>
> >> -----Original Message-----
> >> From: Mark Nelson [mailto:mnelson@xxxxxxxxxx]
> >> Sent: 1. heinäkuuta 2015 13:58
> >> To: Tuomas Juntunen; ceph-users@xxxxxxxxxxxxxx
> >> Subject: Re:  Very low 4k randread performance 
> >> ~1000iops
> >>
> >>
> >>
> >> On 06/30/2015 10:42 PM, Tuomas Juntunen wrote:
> >>> Hi
> >>>
> >>> For seq reads here's the latencies:
> >>>        lat (usec) : 2=0.01%, 10=0.01%, 20=0.01%, 50=0.02%, 100=0.03%
> >>>        lat (usec) : 250=1.02%, 500=87.09%, 750=7.47%, 1000=1.50%
> >>>        lat (msec) : 2=0.76%, 4=1.72%, 10=0.19%, 20=0.19%
> >>>
> >>> Random reads:
> >>>        lat (usec) : 10=0.01%
> >>>        lat (msec) : 2=0.01%, 4=0.01%, 10=0.02%, 20=0.03%, 50=0.55%
> >>>        lat (msec) : 100=99.31%, 250=0.08%
> >>>
> >>> 100msecs seems a lot to me.
> >>
> >> It is, but what's more interesting imho is that it's so consistent.
> >> You don't have some ops completing fast and other ones completing 
> >> slowly
> > holding
> >> everything up.  It's like the OSDs are simply overloaded with 
> >> concurrent
> > IOs
> >> and everything is waiting.  Maybe I'm confused, are your OSDs on SSDs?
> > Are
> >> there spinning disks involved?  If so, what model(s)?
> >>
> >> You might want to use "collectl -sD -oT" on one of the OSD nodes 
> >> during
> > the
> >> test and see what the IO to the disk looks like during random reads 
> >> and
> > the
> >> especially with the svctime for the disks is like.
> >>
> >> Mark
> >>
> >>>
> >>> Br,T
> >>>
> >>> -----Original Message-----
> >>> From: Mark Nelson [mailto:mnelson@xxxxxxxxxx]
> >>> Sent: 30. kesäkuuta 2015 22:01
> >>> To: Tuomas Juntunen; ceph-users@xxxxxxxxxxxxxx
> >>> Subject: Re:  Very low 4k randread performance 
> >>> ~1000iops
> >>>
> >>> Seems reasonable.  What's the latency distribution look like in 
> >>> your fio output file?  Would be useful to know if it's universally 
> >>> slow or if some ops are taking much longer to complete than others.
> >>>
> >>> Mark
> >>>
> >>> On 06/30/2015 01:27 PM, Tuomas Juntunen wrote:
> >>>> I created a file which has the following parameters
> >>>>
> >>>>
> >>>> [random-read]
> >>>> rw=randread
> >>>> size=128m
> >>>> directory=/root/asd
> >>>> ioengine=libaio
> >>>> bs=4k
> >>>> #numjobs=8
> >>>> iodepth=64
> >>>>
> >>>>
> >>>> Br,T
> >>>> -----Original Message-----
> >>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On 
> >>>> Behalf Of Mark Nelson
> >>>> Sent: 30. kesäkuuta 2015 20:55
> >>>> To: ceph-users@xxxxxxxxxxxxxx
> >>>> Subject: Re:  Very low 4k randread performance 
> >>>> ~1000iops
> >>>>
> >>>> Hi Tuomos,
> >>>>
> >>>> Can you paste the command you ran to do the test?
> >>>>
> >>>> Thanks,
> >>>> Mark
> >>>>
> >>>> On 06/30/2015 12:18 PM, Tuomas Juntunen wrote:
> >>>>> Hi
> >>>>>
> >>>>> It’s not probably hitting the disks, but that really doesn’t 
> >>>>> matter. The point is we have very responsive VM’s while writing 
> >>>>> and that is what the users will see.
> >>>>>
> >>>>> The iops we get with sequential read is good, but the random 
> >>>>> read is way too low.
> >>>>>
> >>>>> Is using SSD’s as OSD’s the only way to get it up? or is there 
> >>>>> some tunable which would enhance it? I would assume Linux caches 
> >>>>> reads in memory and serves them from there, but atleast now we 
> >>>>> don’t see it.
> >>>>>
> >>>>> Br,
> >>>>>
> >>>>> Tuomas
> >>>>>
> >>>>> *From:*Somnath Roy [mailto:Somnath.Roy@xxxxxxxxxxx]
> >>>>> *Sent:* 30. kesäkuuta 2015 19:24
> >>>>> *To:* Tuomas Juntunen; 'ceph-users'
> >>>>> *Subject:* RE:  Very low 4k randread performance 
> >>>>> ~1000iops
> >>>>>
> >>>>> Break it down, try fio-rbd to see what is the performance you 
> >>>>> getting..
> >>>>>
> >>>>> But, I am really surprised you are getting > 100k iops for 
> >>>>> write, did you check it is hitting the disks ?
> >>>>>
> >>>>> Thanks & Regards
> >>>>>
> >>>>> Somnath
> >>>>>
> >>>>> *From:*ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] *On 
> >>>>> Behalf Of *Tuomas Juntunen
> >>>>> *Sent:* Tuesday, June 30, 2015 8:33 AM
> >>>>> *To:* 'ceph-users'
> >>>>> *Subject:*  Very low 4k randread performance 
> >>>>> ~1000iops
> >>>>>
> >>>>> Hi
> >>>>>
> >>>>> I have been trying to figure out why our 4k random reads in VM’s 
> >>>>> are so bad. I am using fio to test this.
> >>>>>
> >>>>> Write : 170k iops
> >>>>>
> >>>>> Random write : 109k iops
> >>>>>
> >>>>> Read : 64k iops
> >>>>>
> >>>>> Random read : 1k iops
> >>>>>
> >>>>> Our setup is:
> >>>>>
> >>>>> 3 nodes with 36 OSDs, 18 SSD’s one SSD for two OSD’s, each node 
> >>>>> has 64gb mem & 2x6core cpu’s
> >>>>>
> >>>>> 4 monitors running on other servers
> >>>>>
> >>>>> 40gbit infiniband with IPoIB
> >>>>>
> >>>>> Openstack : Qemu-kvm for virtuals
> >>>>>
> >>>>> Any help would be appreciated
> >>>>>
> >>>>> Thank you in advance.
> >>>>>
> >>>>> Br,
> >>>>>
> >>>>> Tuomas
> >>>>>
> >>>>> ----------------------------------------------------------------
> >>>>> ----
> >>>>> -
> >>>>> -
> >>>>> --
> >>>>>
> >>>>>
> >>>>> PLEASE NOTE: The information contained in this electronic mail 
> >>>>> message is intended only for the use of the designated 
> >>>>> recipient(s) named
> >>> above.
> >>>>> If the reader of this message is not the intended recipient, you 
> >>>>> are hereby notified that you have received this message in error 
> >>>>> and that any review, dissemination, distribution, or copying of 
> >>>>> this message is strictly prohibited. If you have received this 
> >>>>> communication in error, please notify the sender by telephone or 
> >>>>> e-mail (as shown
> >>>>> above) immediately and destroy any and all copies of this 
> >>>>> message in your possession (whether hard copies or 
> >>>>> electronically stored copies).
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> ceph-users mailing list
> >>>>> ceph-users@xxxxxxxxxxxxxx
> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>
> >>>> _______________________________________________
> >>>> ceph-users mailing list
> >>>> ceph-users@xxxxxxxxxxxxxx
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>
> >>>
> >>
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com