Re: Very low 4k randread performance ~1000iops

"Tuomas Juntunen" <tuomas.juntunen@xxxxxxxxxxxxxxx> · Wed, 1 Jul 2015 23:01:43 +0300

Hi

I'll check the possibility on testing EnhanceIO. I'll report back on this.

Thanks

Br,T

-----Original Message-----
From: Mark Nelson [mailto:mnelson@xxxxxxxxxx] 
Sent: 1. heinäkuuta 2015 21:51
To: Tuomas Juntunen; ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Very low 4k randread performance ~1000iops

On 07/01/2015 01:39 PM, Tuomas Juntunen wrote:
> Thanks Mark
>
> Are there any plans for ZFS like L2ARC to CEPH or is the cache tiering 
> what should work like this in the future?
>
> I have tested cache tier + EC pool, and that created too much load on 
> our servers, so it was not viable to be used.

We are doing a lot of work in this space right now.  Hopefully we'll see
improvements coming in the coming releases.

>
> I was also wondering if EnhanceIO would be a good solution for getting 
> more random iops. I've read some Sébastien's writings.

Possibly!  Try it and let us know. ;)

>
> Br,
> Tuomas
>
>
> -----Original Message-----
> From: Mark Nelson [mailto:mnelson@xxxxxxxxxx]
> Sent: 1. heinäkuuta 2015 20:29
> To: Tuomas Juntunen; ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  Very low 4k randread performance ~1000iops
>
> On 07/01/2015 12:13 PM, Tuomas Juntunen wrote:
>> Hi
>>
>> Yes, the OSD's are on spinning disks and we have 18 SSD's for 
>> journal, one SSD for two OSD's
>>
>> The OSD's are:
>> Model Family:     Seagate Barracuda 7200.14 (AF)
>> Device Model:     ST2000DM001-1CH164
>>
>> What I've understood the journals are not used as read cache at all, 
>> just for writing. Would SSD based cache pool be viable solution here?
>
> Ok, so that makes more sense. The performance is still lower than 
> expected but maybe 3-4x rather than several orders of magnitude.  My 
> guess is that cache tiering in it's current form probably won't help 
> you much unless you have a workload that fits mostly into the cache.  
> The promotion penalty is really high though so we likely will have to 
> promote much more slowly than we currently do.
>
> Mark
>
>>
>> Br, T
>>
>> -----Original Message-----
>> From: Mark Nelson [mailto:mnelson@xxxxxxxxxx]
>> Sent: 1. heinäkuuta 2015 13:58
>> To: Tuomas Juntunen; ceph-users@xxxxxxxxxxxxxx
>> Subject: Re:  Very low 4k randread performance ~1000iops
>>
>>
>>
>> On 06/30/2015 10:42 PM, Tuomas Juntunen wrote:
>>> Hi
>>>
>>> For seq reads here's the latencies:
>>>        lat (usec) : 2=0.01%, 10=0.01%, 20=0.01%, 50=0.02%, 100=0.03%
>>>        lat (usec) : 250=1.02%, 500=87.09%, 750=7.47%, 1000=1.50%
>>>        lat (msec) : 2=0.76%, 4=1.72%, 10=0.19%, 20=0.19%
>>>
>>> Random reads:
>>>        lat (usec) : 10=0.01%
>>>        lat (msec) : 2=0.01%, 4=0.01%, 10=0.02%, 20=0.03%, 50=0.55%
>>>        lat (msec) : 100=99.31%, 250=0.08%
>>>
>>> 100msecs seems a lot to me.
>>
>> It is, but what's more interesting imho is that it's so consistent.  
>> You don't have some ops completing fast and other ones completing 
>> slowly
> holding
>> everything up.  It's like the OSDs are simply overloaded with 
>> concurrent
> IOs
>> and everything is waiting.  Maybe I'm confused, are your OSDs on SSDs?
> Are
>> there spinning disks involved?  If so, what model(s)?
>>
>> You might want to use "collectl -sD -oT" on one of the OSD nodes 
>> during
> the
>> test and see what the IO to the disk looks like during random reads 
>> and
> the
>> especially with the svctime for the disks is like.
>>
>> Mark
>>
>>>
>>> Br,T
>>>
>>> -----Original Message-----
>>> From: Mark Nelson [mailto:mnelson@xxxxxxxxxx]
>>> Sent: 30. kesäkuuta 2015 22:01
>>> To: Tuomas Juntunen; ceph-users@xxxxxxxxxxxxxx
>>> Subject: Re:  Very low 4k randread performance ~1000iops
>>>
>>> Seems reasonable.  What's the latency distribution look like in your 
>>> fio output file?  Would be useful to know if it's universally slow 
>>> or if some ops are taking much longer to complete than others.
>>>
>>> Mark
>>>
>>> On 06/30/2015 01:27 PM, Tuomas Juntunen wrote:
>>>> I created a file which has the following parameters
>>>>
>>>>
>>>> [random-read]
>>>> rw=randread
>>>> size=128m
>>>> directory=/root/asd
>>>> ioengine=libaio
>>>> bs=4k
>>>> #numjobs=8
>>>> iodepth=64
>>>>
>>>>
>>>> Br,T
>>>> -----Original Message-----
>>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On 
>>>> Behalf Of Mark Nelson
>>>> Sent: 30. kesäkuuta 2015 20:55
>>>> To: ceph-users@xxxxxxxxxxxxxx
>>>> Subject: Re:  Very low 4k randread performance 
>>>> ~1000iops
>>>>
>>>> Hi Tuomos,
>>>>
>>>> Can you paste the command you ran to do the test?
>>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>> On 06/30/2015 12:18 PM, Tuomas Juntunen wrote:
>>>>> Hi
>>>>>
>>>>> It?s not probably hitting the disks, but that really doesn?t matter.
>>>>> The point is we have very responsive VM?s while writing and that 
>>>>> is what the users will see.
>>>>>
>>>>> The iops we get with sequential read is good, but the random read 
>>>>> is way too low.
>>>>>
>>>>> Is using SSD?s as OSD?s the only way to get it up? or is there 
>>>>> some tunable which would enhance it? I would assume Linux caches 
>>>>> reads in memory and serves them from there, but atleast now we don?t
see it.
>>>>>
>>>>> Br,
>>>>>
>>>>> Tuomas
>>>>>
>>>>> *From:*Somnath Roy [mailto:Somnath.Roy@xxxxxxxxxxx]
>>>>> *Sent:* 30. kesäkuuta 2015 19:24
>>>>> *To:* Tuomas Juntunen; 'ceph-users'
>>>>> *Subject:* RE:  Very low 4k randread performance 
>>>>> ~1000iops
>>>>>
>>>>> Break it down, try fio-rbd to see what is the performance you
getting..
>>>>>
>>>>> But, I am really surprised you are getting > 100k iops for write, 
>>>>> did you check it is hitting the disks ?
>>>>>
>>>>> Thanks & Regards
>>>>>
>>>>> Somnath
>>>>>
>>>>> *From:*ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] *On 
>>>>> Behalf Of *Tuomas Juntunen
>>>>> *Sent:* Tuesday, June 30, 2015 8:33 AM
>>>>> *To:* 'ceph-users'
>>>>> *Subject:*  Very low 4k randread performance ~1000iops
>>>>>
>>>>> Hi
>>>>>
>>>>> I have been trying to figure out why our 4k random reads in VM?s 
>>>>> are so bad. I am using fio to test this.
>>>>>
>>>>> Write : 170k iops
>>>>>
>>>>> Random write : 109k iops
>>>>>
>>>>> Read : 64k iops
>>>>>
>>>>> Random read : 1k iops
>>>>>
>>>>> Our setup is:
>>>>>
>>>>> 3 nodes with 36 OSDs, 18 SSD?s one SSD for two OSD?s, each node 
>>>>> has 64gb mem & 2x6core cpu?s
>>>>>
>>>>> 4 monitors running on other servers
>>>>>
>>>>> 40gbit infiniband with IPoIB
>>>>>
>>>>> Openstack : Qemu-kvm for virtuals
>>>>>
>>>>> Any help would be appreciated
>>>>>
>>>>> Thank you in advance.
>>>>>
>>>>> Br,
>>>>>
>>>>> Tuomas
>>>>>
>>>>> ------------------------------------------------------------------
>>>>> --
>>>>> -
>>>>> -
>>>>> --
>>>>>
>>>>>
>>>>> PLEASE NOTE: The information contained in this electronic mail 
>>>>> message is intended only for the use of the designated 
>>>>> recipient(s) named
>>> above.
>>>>> If the reader of this message is not the intended recipient, you 
>>>>> are hereby notified that you have received this message in error 
>>>>> and that any review, dissemination, distribution, or copying of 
>>>>> this message is strictly prohibited. If you have received this 
>>>>> communication in error, please notify the sender by telephone or 
>>>>> e-mail (as shown
>>>>> above) immediately and destroy any and all copies of this message 
>>>>> in your possession (whether hard copies or electronically stored
copies).
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com