Re: Very low 4k randread performance ~1000iops

Mark Nelson <mnelson@xxxxxxxxxx> · Wed, 01 Jul 2015 13:50:39 -0500

On 07/01/2015 01:39 PM, Tuomas Juntunen wrote:
Thanks Mark

Are there any plans for ZFS like L2ARC to CEPH or is the cache tiering what
should work like this in the future?

I have tested cache tier + EC pool, and that created too much load on our
servers, so it was not viable to be used.

We are doing a lot of work in this space right now.  Hopefully we'll see 
improvements coming in the coming releases.

I was also wondering if EnhanceIO would be a good solution for getting more
random iops. I've read some Sébastien's writings.

Possibly!  Try it and let us know. ;)

Br,
Tuomas

-----Original Message-----
From: Mark Nelson [mailto:mnelson@xxxxxxxxxx]
Sent: 1. heinäkuuta 2015 20:29
To: Tuomas Juntunen; ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Very low 4k randread performance ~1000iops

On 07/01/2015 12:13 PM, Tuomas Juntunen wrote:
Hi

Yes, the OSD's are on spinning disks and we have 18 SSD's for journal,
one SSD for two OSD's

The OSD's are:
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST2000DM001-1CH164

What I've understood the journals are not used as read cache at all,
just for writing. Would SSD based cache pool be viable solution here?

Ok, so that makes more sense. The performance is still lower than expected
but maybe 3-4x rather than several orders of magnitude.  My guess is that
cache tiering in it's current form probably won't help you much unless you
have a workload that fits mostly into the cache.  The promotion penalty is
really high though so we likely will have to promote much more slowly than
we currently do.

Mark

Br, T

-----Original Message-----
From: Mark Nelson [mailto:mnelson@xxxxxxxxxx]
Sent: 1. heinäkuuta 2015 13:58
To: Tuomas Juntunen; ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Very low 4k randread performance ~1000iops

On 06/30/2015 10:42 PM, Tuomas Juntunen wrote:
Hi

For seq reads here's the latencies:
       lat (usec) : 2=0.01%, 10=0.01%, 20=0.01%, 50=0.02%, 100=0.03%
       lat (usec) : 250=1.02%, 500=87.09%, 750=7.47%, 1000=1.50%
       lat (msec) : 2=0.76%, 4=1.72%, 10=0.19%, 20=0.19%

Random reads:
       lat (usec) : 10=0.01%
       lat (msec) : 2=0.01%, 4=0.01%, 10=0.02%, 20=0.03%, 50=0.55%
       lat (msec) : 100=99.31%, 250=0.08%

100msecs seems a lot to me.

It is, but what's more interesting imho is that it's so consistent.  You
don't have some ops completing fast and other ones completing slowly
holding
everything up.  It's like the OSDs are simply overloaded with concurrent
IOs
and everything is waiting.  Maybe I'm confused, are your OSDs on SSDs?
Are
there spinning disks involved?  If so, what model(s)?

You might want to use "collectl -sD -oT" on one of the OSD nodes during
the
test and see what the IO to the disk looks like during random reads and
the
especially with the svctime for the disks is like.

Mark

Br,T

-----Original Message-----
From: Mark Nelson [mailto:mnelson@xxxxxxxxxx]
Sent: 30. kesäkuuta 2015 22:01
To: Tuomas Juntunen; ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Very low 4k randread performance ~1000iops

Seems reasonable.  What's the latency distribution look like in your
fio output file?  Would be useful to know if it's universally slow or
if some ops are taking much longer to complete than others.

Mark

On 06/30/2015 01:27 PM, Tuomas Juntunen wrote:
I created a file which has the following parameters

[random-read]
rw=randread
size=128m
directory=/root/asd
ioengine=libaio
bs=4k
#numjobs=8
iodepth=64

Br,T
-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
Of Mark Nelson
Sent: 30. kesäkuuta 2015 20:55
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Very low 4k randread performance ~1000iops

Hi Tuomos,

Can you paste the command you ran to do the test?

Thanks,
Mark

On 06/30/2015 12:18 PM, Tuomas Juntunen wrote:
Hi

It’s not probably hitting the disks, but that really doesn’t matter.
The point is we have very responsive VM’s while writing and that is
what the users will see.

The iops we get with sequential read is good, but the random read is
way too low.

Is using SSD’s as OSD’s the only way to get it up? or is there some
tunable which would enhance it? I would assume Linux caches reads in
memory and serves them from there, but atleast now we don’t see it.

Br,

Tuomas

*From:*Somnath Roy [mailto:Somnath.Roy@xxxxxxxxxxx]
*Sent:* 30. kesäkuuta 2015 19:24
*To:* Tuomas Juntunen; 'ceph-users'
*Subject:* RE:  Very low 4k randread performance
~1000iops

Break it down, try fio-rbd to see what is the performance you getting..

But, I am really surprised you are getting > 100k iops for write,
did you check it is hitting the disks ?

Thanks & Regards

Somnath

*From:*ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] *On
Behalf Of *Tuomas Juntunen
*Sent:* Tuesday, June 30, 2015 8:33 AM
*To:* 'ceph-users'
*Subject:*  Very low 4k randread performance ~1000iops

Hi

I have been trying to figure out why our 4k random reads in VM’s are
so bad. I am using fio to test this.

Write : 170k iops

Random write : 109k iops

Read : 64k iops

Random read : 1k iops

Our setup is:

3 nodes with 36 OSDs, 18 SSD’s one SSD for two OSD’s, each node has
64gb mem & 2x6core cpu’s

4 monitors running on other servers

40gbit infiniband with IPoIB

Openstack : Qemu-kvm for virtuals

Any help would be appreciated

Thank you in advance.

Br,

Tuomas

--------------------------------------------------------------------
-
-
--

PLEASE NOTE: The information contained in this electronic mail
message is intended only for the use of the designated recipient(s)
named
above.
If the reader of this message is not the intended recipient, you are
hereby notified that you have received this message in error and
that any review, dissemination, distribution, or copying of this
message is strictly prohibited. If you have received this
communication in error, please notify the sender by telephone or
e-mail (as shown
above) immediately and destroy any and all copies of this message in
your possession (whether hard copies or electronically stored copies).

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com