[Single OSD performance on SSD] Can't go over 3, 2K IOPS

Somnath.Roy@xxxxxxxxxxx (Somnath Roy) · Fri, 29 Aug 2014 06:37:45 +0000

Thanks Haomai !

Here is some of the data from my setup.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Set up:

--------

32 core cpu with HT enabled, 128 GB RAM, one SSD (both journal and data) -> one OSD. 5 client m/c with 12 core cpu and each running two instances of ceph_smalliobench (10 clients total). Network is 10GbE.

Workload:

-------------

Small workload ? 20K objects with 4K size and io_size is also 4K RR. The intent is to serve the ios from memory so that it can uncover the performance problems within single OSD.

Results from Firefly:

--------------------------

Single client throughput is ~14K iops, but as the number of client increases the aggregated throughput is not increasing. 10 clients ~15K iops. ~9-10 cpu cores are used.

Result with latest master:

------------------------------

Single client is ~14K iops, but scaling as number of clients increases. 10 clients ~107K iops. ~25 cpu cores are used.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

More realistic workload:

-----------------------------

Let?s see how it is performing while > 90% of the ios are served from disks

Setup:

-------

40 cpu core server as a cluster node (single node cluster) with 64 GB RAM. 8 SSDs -> 8 OSDs. One similar node for monitor and rgw. Another node for client running fio/vdbench. 4 rbds are configured with ?noshare? option. 40 GbE network

Workload:

------------

8 SSDs are populated , so, 8 * 800GB = ~6.4 TB of data.  Io_size = 4K RR.

Results from Firefly:

------------------------

Aggregated output while 4 rbd clients stressing the cluster in parallel is ~20-25K IOPS , cpu cores used ~8-10 cores (may be less can?t remember precisely)

Results from latest master:

--------------------------------

Aggregated output while 4 rbd clients stressing the cluster in parallel is ~120K IOPS , cpu is 7% idle i.e  ~37-38 cpu cores.

Hope this helps.

Thanks & Regards

Somnath

-----Original Message-----
From: Haomai Wang [mailto:haomaiwang@xxxxxxxxx]
Sent: Thursday, August 28, 2014 8:01 PM
To: Somnath Roy
Cc: Andrey Korolyov; ceph-users at lists.ceph.com
Subject: Re: [Single OSD performance on SSD] Can't go over 3, 2K IOPS

Hi Roy,

I already scan your merged codes about "fdcache" and "optimizing for lfn_find/lfn_open", could you give some performance improvement data about it? I fully agree with your orientation, do you have any update about it?

As for messenger level, I have some very early works on it(https://github.com/yuyuyu101/ceph/tree/msg-event), it contains a new messenger implementation which support different event mechanism.

It looks like at least one more week to make it work.

On Fri, Aug 29, 2014 at 5:48 AM, Somnath Roy <Somnath.Roy at sandisk.com<mailto:Somnath.Roy at sandisk.com>> wrote:

> Yes, what I saw the messenger level bottleneck is still huge !

> Hopefully RDMA messenger will resolve that and the performance gain will be significant for Read (on SSDs). For write we need to uncover the OSD bottlenecks first to take advantage of the improved upstream.

> What I experienced that till you remove the very last bottleneck the performance improvement will not be visible and that could be confusing because you might think that the upstream improvement you did is not valid (which is not).

>

> Thanks & Regards

> Somnath

> -----Original Message-----

> From: Andrey Korolyov [mailto:andrey at xdel.ru]

> Sent: Thursday, August 28, 2014 12:57 PM

> To: Somnath Roy

> Cc: David Moreau Simard; Mark Nelson; ceph-users at lists.ceph.com<mailto:ceph-users at lists.ceph.com>

> Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go

> over 3, 2K IOPS

>

> On Thu, Aug 28, 2014 at 10:48 PM, Somnath Roy <Somnath.Roy at sandisk.com<mailto:Somnath.Roy at sandisk.com>> wrote:

>> Nope, this will not be back ported to Firefly I guess.

>>

>> Thanks & Regards

>> Somnath

>>

>

> Thanks for sharing this, the first thing in thought when I looked at

> this thread, was your patches :)

>

> If Giant will incorporate them, both the RDMA support and those should give a huge performance boost for RDMA-enabled Ceph backnets.

>

> ________________________________

>

> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

>

> _______________________________________________

> ceph-users mailing list

> ceph-users at lists.ceph.com<mailto:ceph-users at lists.ceph.com>

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--

Best Regards,

Wheat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140829/3e43a160/attachment.htm>