Thanks Mark Are there any plans for ZFS like L2ARC to CEPH or is the cache tiering what should work like this in the future? I have tested cache tier + EC pool, and that created too much load on our servers, so it was not viable to be used. I was also wondering if EnhanceIO would be a good solution for getting more random iops. I've read some Sébastien's writings. Br, Tuomas -----Original Message----- From: Mark Nelson [mailto:mnelson@xxxxxxxxxx] Sent: 1. heinäkuuta 2015 20:29 To: Tuomas Juntunen; ceph-users@xxxxxxxxxxxxxx Subject: Re: Very low 4k randread performance ~1000iops On 07/01/2015 12:13 PM, Tuomas Juntunen wrote: > Hi > > Yes, the OSD's are on spinning disks and we have 18 SSD's for journal, > one SSD for two OSD's > > The OSD's are: > Model Family: Seagate Barracuda 7200.14 (AF) > Device Model: ST2000DM001-1CH164 > > What I've understood the journals are not used as read cache at all, > just for writing. Would SSD based cache pool be viable solution here? Ok, so that makes more sense. The performance is still lower than expected but maybe 3-4x rather than several orders of magnitude. My guess is that cache tiering in it's current form probably won't help you much unless you have a workload that fits mostly into the cache. The promotion penalty is really high though so we likely will have to promote much more slowly than we currently do. Mark > > Br, T > > -----Original Message----- > From: Mark Nelson [mailto:mnelson@xxxxxxxxxx] > Sent: 1. heinäkuuta 2015 13:58 > To: Tuomas Juntunen; ceph-users@xxxxxxxxxxxxxx > Subject: Re: Very low 4k randread performance ~1000iops > > > > On 06/30/2015 10:42 PM, Tuomas Juntunen wrote: >> Hi >> >> For seq reads here's the latencies: >> lat (usec) : 2=0.01%, 10=0.01%, 20=0.01%, 50=0.02%, 100=0.03% >> lat (usec) : 250=1.02%, 500=87.09%, 750=7.47%, 1000=1.50% >> lat (msec) : 2=0.76%, 4=1.72%, 10=0.19%, 20=0.19% >> >> Random reads: >> lat (usec) : 10=0.01% >> lat (msec) : 2=0.01%, 4=0.01%, 10=0.02%, 20=0.03%, 50=0.55% >> lat (msec) : 100=99.31%, 250=0.08% >> >> 100msecs seems a lot to me. > > It is, but what's more interesting imho is that it's so consistent. You > don't have some ops completing fast and other ones completing slowly holding > everything up. It's like the OSDs are simply overloaded with concurrent IOs > and everything is waiting. Maybe I'm confused, are your OSDs on SSDs? Are > there spinning disks involved? If so, what model(s)? > > You might want to use "collectl -sD -oT" on one of the OSD nodes during the > test and see what the IO to the disk looks like during random reads and the > especially with the svctime for the disks is like. > > Mark > >> >> Br,T >> >> -----Original Message----- >> From: Mark Nelson [mailto:mnelson@xxxxxxxxxx] >> Sent: 30. kesäkuuta 2015 22:01 >> To: Tuomas Juntunen; ceph-users@xxxxxxxxxxxxxx >> Subject: Re: Very low 4k randread performance ~1000iops >> >> Seems reasonable. What's the latency distribution look like in your >> fio output file? Would be useful to know if it's universally slow or >> if some ops are taking much longer to complete than others. >> >> Mark >> >> On 06/30/2015 01:27 PM, Tuomas Juntunen wrote: >>> I created a file which has the following parameters >>> >>> >>> [random-read] >>> rw=randread >>> size=128m >>> directory=/root/asd >>> ioengine=libaio >>> bs=4k >>> #numjobs=8 >>> iodepth=64 >>> >>> >>> Br,T >>> -----Original Message----- >>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf >>> Of Mark Nelson >>> Sent: 30. kesäkuuta 2015 20:55 >>> To: ceph-users@xxxxxxxxxxxxxx >>> Subject: Re: Very low 4k randread performance ~1000iops >>> >>> Hi Tuomos, >>> >>> Can you paste the command you ran to do the test? >>> >>> Thanks, >>> Mark >>> >>> On 06/30/2015 12:18 PM, Tuomas Juntunen wrote: >>>> Hi >>>> >>>> It?s not probably hitting the disks, but that really doesn?t matter. >>>> The point is we have very responsive VM?s while writing and that is >>>> what the users will see. >>>> >>>> The iops we get with sequential read is good, but the random read is >>>> way too low. >>>> >>>> Is using SSD?s as OSD?s the only way to get it up? or is there some >>>> tunable which would enhance it? I would assume Linux caches reads in >>>> memory and serves them from there, but atleast now we don?t see it. >>>> >>>> Br, >>>> >>>> Tuomas >>>> >>>> *From:*Somnath Roy [mailto:Somnath.Roy@xxxxxxxxxxx] >>>> *Sent:* 30. kesäkuuta 2015 19:24 >>>> *To:* Tuomas Juntunen; 'ceph-users' >>>> *Subject:* RE: Very low 4k randread performance >>>> ~1000iops >>>> >>>> Break it down, try fio-rbd to see what is the performance you getting.. >>>> >>>> But, I am really surprised you are getting > 100k iops for write, >>>> did you check it is hitting the disks ? >>>> >>>> Thanks & Regards >>>> >>>> Somnath >>>> >>>> *From:*ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] *On >>>> Behalf Of *Tuomas Juntunen >>>> *Sent:* Tuesday, June 30, 2015 8:33 AM >>>> *To:* 'ceph-users' >>>> *Subject:* Very low 4k randread performance ~1000iops >>>> >>>> Hi >>>> >>>> I have been trying to figure out why our 4k random reads in VM?s are >>>> so bad. I am using fio to test this. >>>> >>>> Write : 170k iops >>>> >>>> Random write : 109k iops >>>> >>>> Read : 64k iops >>>> >>>> Random read : 1k iops >>>> >>>> Our setup is: >>>> >>>> 3 nodes with 36 OSDs, 18 SSD?s one SSD for two OSD?s, each node has >>>> 64gb mem & 2x6core cpu?s >>>> >>>> 4 monitors running on other servers >>>> >>>> 40gbit infiniband with IPoIB >>>> >>>> Openstack : Qemu-kvm for virtuals >>>> >>>> Any help would be appreciated >>>> >>>> Thank you in advance. >>>> >>>> Br, >>>> >>>> Tuomas >>>> >>>> -------------------------------------------------------------------- >>>> - >>>> - >>>> -- >>>> >>>> >>>> PLEASE NOTE: The information contained in this electronic mail >>>> message is intended only for the use of the designated recipient(s) >>>> named >> above. >>>> If the reader of this message is not the intended recipient, you are >>>> hereby notified that you have received this message in error and >>>> that any review, dissemination, distribution, or copying of this >>>> message is strictly prohibited. If you have received this >>>> communication in error, please notify the sender by telephone or >>>> e-mail (as shown >>>> above) immediately and destroy any and all copies of this message in >>>> your possession (whether hard copies or electronically stored copies). >>>> >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com