Re: Very low 4k randread performance ~1000iops

"Tuomas Juntunen" <tuomas.juntunen@xxxxxxxxxxxxxxx> · Wed, 1 Jul 2015 21:39:49 +0300

Thanks Mark

Are there any plans for ZFS like L2ARC to CEPH or is the cache tiering what
should work like this in the future?

I have tested cache tier + EC pool, and that created too much load on our
servers, so it was not viable to be used.

I was also wondering if EnhanceIO would be a good solution for getting more
random iops. I've read some Sébastien's writings.

Br,
Tuomas

-----Original Message-----
From: Mark Nelson [mailto:mnelson@xxxxxxxxxx] 
Sent: 1. heinäkuuta 2015 20:29
To: Tuomas Juntunen; ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Very low 4k randread performance ~1000iops

On 07/01/2015 12:13 PM, Tuomas Juntunen wrote:
> Hi
>
> Yes, the OSD's are on spinning disks and we have 18 SSD's for journal, 
> one SSD for two OSD's
>
> The OSD's are:
> Model Family:     Seagate Barracuda 7200.14 (AF)
> Device Model:     ST2000DM001-1CH164
>
> What I've understood the journals are not used as read cache at all, 
> just for writing. Would SSD based cache pool be viable solution here?

Ok, so that makes more sense. The performance is still lower than expected
but maybe 3-4x rather than several orders of magnitude.  My guess is that
cache tiering in it's current form probably won't help you much unless you
have a workload that fits mostly into the cache.  The promotion penalty is
really high though so we likely will have to promote much more slowly than
we currently do.

Mark

>
> Br, T
>
> -----Original Message-----
> From: Mark Nelson [mailto:mnelson@xxxxxxxxxx]
> Sent: 1. heinäkuuta 2015 13:58
> To: Tuomas Juntunen; ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  Very low 4k randread performance ~1000iops
>
>
>
> On 06/30/2015 10:42 PM, Tuomas Juntunen wrote:
>> Hi
>>
>> For seq reads here's the latencies:
>>       lat (usec) : 2=0.01%, 10=0.01%, 20=0.01%, 50=0.02%, 100=0.03%
>>       lat (usec) : 250=1.02%, 500=87.09%, 750=7.47%, 1000=1.50%
>>       lat (msec) : 2=0.76%, 4=1.72%, 10=0.19%, 20=0.19%
>>
>> Random reads:
>>       lat (usec) : 10=0.01%
>>       lat (msec) : 2=0.01%, 4=0.01%, 10=0.02%, 20=0.03%, 50=0.55%
>>       lat (msec) : 100=99.31%, 250=0.08%
>>
>> 100msecs seems a lot to me.
>
> It is, but what's more interesting imho is that it's so consistent.  You
> don't have some ops completing fast and other ones completing slowly
holding
> everything up.  It's like the OSDs are simply overloaded with concurrent
IOs
> and everything is waiting.  Maybe I'm confused, are your OSDs on SSDs?
Are
> there spinning disks involved?  If so, what model(s)?
>
> You might want to use "collectl -sD -oT" on one of the OSD nodes during
the
> test and see what the IO to the disk looks like during random reads and
the
> especially with the svctime for the disks is like.
>
> Mark
>
>>
>> Br,T
>>
>> -----Original Message-----
>> From: Mark Nelson [mailto:mnelson@xxxxxxxxxx]
>> Sent: 30. kesäkuuta 2015 22:01
>> To: Tuomas Juntunen; ceph-users@xxxxxxxxxxxxxx
>> Subject: Re:  Very low 4k randread performance ~1000iops
>>
>> Seems reasonable.  What's the latency distribution look like in your
>> fio output file?  Would be useful to know if it's universally slow or
>> if some ops are taking much longer to complete than others.
>>
>> Mark
>>
>> On 06/30/2015 01:27 PM, Tuomas Juntunen wrote:
>>> I created a file which has the following parameters
>>>
>>>
>>> [random-read]
>>> rw=randread
>>> size=128m
>>> directory=/root/asd
>>> ioengine=libaio
>>> bs=4k
>>> #numjobs=8
>>> iodepth=64
>>>
>>>
>>> Br,T
>>> -----Original Message-----
>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
>>> Of Mark Nelson
>>> Sent: 30. kesäkuuta 2015 20:55
>>> To: ceph-users@xxxxxxxxxxxxxx
>>> Subject: Re:  Very low 4k randread performance ~1000iops
>>>
>>> Hi Tuomos,
>>>
>>> Can you paste the command you ran to do the test?
>>>
>>> Thanks,
>>> Mark
>>>
>>> On 06/30/2015 12:18 PM, Tuomas Juntunen wrote:
>>>> Hi
>>>>
>>>> It?s not probably hitting the disks, but that really doesn?t matter.
>>>> The point is we have very responsive VM?s while writing and that is
>>>> what the users will see.
>>>>
>>>> The iops we get with sequential read is good, but the random read is
>>>> way too low.
>>>>
>>>> Is using SSD?s as OSD?s the only way to get it up? or is there some
>>>> tunable which would enhance it? I would assume Linux caches reads in
>>>> memory and serves them from there, but atleast now we don?t see it.
>>>>
>>>> Br,
>>>>
>>>> Tuomas
>>>>
>>>> *From:*Somnath Roy [mailto:Somnath.Roy@xxxxxxxxxxx]
>>>> *Sent:* 30. kesäkuuta 2015 19:24
>>>> *To:* Tuomas Juntunen; 'ceph-users'
>>>> *Subject:* RE:  Very low 4k randread performance
>>>> ~1000iops
>>>>
>>>> Break it down, try fio-rbd to see what is the performance you getting..
>>>>
>>>> But, I am really surprised you are getting > 100k iops for write,
>>>> did you check it is hitting the disks ?
>>>>
>>>> Thanks & Regards
>>>>
>>>> Somnath
>>>>
>>>> *From:*ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] *On
>>>> Behalf Of *Tuomas Juntunen
>>>> *Sent:* Tuesday, June 30, 2015 8:33 AM
>>>> *To:* 'ceph-users'
>>>> *Subject:*  Very low 4k randread performance ~1000iops
>>>>
>>>> Hi
>>>>
>>>> I have been trying to figure out why our 4k random reads in VM?s are
>>>> so bad. I am using fio to test this.
>>>>
>>>> Write : 170k iops
>>>>
>>>> Random write : 109k iops
>>>>
>>>> Read : 64k iops
>>>>
>>>> Random read : 1k iops
>>>>
>>>> Our setup is:
>>>>
>>>> 3 nodes with 36 OSDs, 18 SSD?s one SSD for two OSD?s, each node has
>>>> 64gb mem & 2x6core cpu?s
>>>>
>>>> 4 monitors running on other servers
>>>>
>>>> 40gbit infiniband with IPoIB
>>>>
>>>> Openstack : Qemu-kvm for virtuals
>>>>
>>>> Any help would be appreciated
>>>>
>>>> Thank you in advance.
>>>>
>>>> Br,
>>>>
>>>> Tuomas
>>>>
>>>> --------------------------------------------------------------------
>>>> -
>>>> -
>>>> --
>>>>
>>>>
>>>> PLEASE NOTE: The information contained in this electronic mail
>>>> message is intended only for the use of the designated recipient(s)
>>>> named
>> above.
>>>> If the reader of this message is not the intended recipient, you are
>>>> hereby notified that you have received this message in error and
>>>> that any review, dissemination, distribution, or copying of this
>>>> message is strictly prohibited. If you have received this
>>>> communication in error, please notify the sender by telephone or
>>>> e-mail (as shown
>>>> above) immediately and destroy any and all copies of this message in
>>>> your possession (whether hard copies or electronically stored copies).
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com