Re: Performance question

Marek Dohojda <mdohojda@xxxxxxxxxxxxxxxxxxx> · Tue, 24 Nov 2015 11:46:45 -0700

I dunno, I think I just go into my Lotus and mull this over ;) (I wish)

This is a storage for a KVM, and we have quite a few boxes.  While right now none are suffering from IO load, I am seeing slowdown personally and know that sooner or later others will notice as well.  
I think what I will do is remove the SSD from the cluster, and put journals on it.  

On Tue, Nov 24, 2015 at 11:42 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
Separate would be best, but as with many things in life we are not all driving around in sports cars!! 

Moving the journals to the SSD’s that are also OSD’s themselves will be fine. SSD’s tend to be more bandwidth limited than IOPs and the reverse is true for Disks, so you will get maybe 2x improvement for the disk pool and you probably won’t even notice the impact on the SSD pool.

Can I just ask what your workload will be? There maybe other things that can be done.

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Marek Dohojda
Sent: 24 November 2015 18:32
To: Alan Johnson <alanj@xxxxxxxxxxxxxx>
Cc: ceph-users@xxxxxxxxxxxxxx; Nick Fisk <nick@xxxxxxxxxx>

Subject: Re:  Performance question

Thank you! I will do that.  Would you suggest getting another SSD drive or move the journal to the SSD OSD? 

(Sorry for a stupid question, if that is such).

On Tue, Nov 24, 2015 at 11:25 AM, Alan Johnson <alanj@xxxxxxxxxxxxxx> wrote:
Or separate the journals as this will bring the workload down on the spinners to 3Xrather than 6X

From: Marek Dohojda [mailto:mdohojda@xxxxxxxxxxxxxxxxxxx] 
Sent: Tuesday, November 24, 2015 1:24 PM
To: Nick Fisk
Cc: Alan Johnson; ceph-users@xxxxxxxxxxxxxx

Subject: Re:  Performance question

Crad I think you are 100% correct:

rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util

 0.00   369.00   33.00 1405.00   132.00 135656.00   188.86     5.61    4.02   21.94    3.60   0.70 100.00

I was kinda wondering that this maybe the case, which is why I was wondering if I should be doing too much in terms of troubleshooting.

So basically what you are saying I need to wait for new version?

Thank you very much everybody!

On Tue, Nov 24, 2015 at 9:35 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
You haven’t stated what size replication you are running. Keep in mind that with a replication factor of 3, you will be writing 6x the amount of data down to disks than what the benchmark says (3x replication x2 for data+journal write). 

You might actually be near the hardware maximums. What does iostat looks like whilst you are running rados bench, are the disks getting maxed out?

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Marek Dohojda
Sent: 24 November 2015 16:27
To: Alan Johnson <alanj@xxxxxxxxxxxxxx>

Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Performance question

7 total servers, 20 GIG pipe between servers, both reads and writes.  The network itself has plenty of pipe left, it is averaging 40Mbits/s 

Rados Bench SAS 30 writes
 Total time run:         30.591927
Total writes made:      386
Write size:             4194304
Bandwidth (MB/sec):     50.471 

Stddev Bandwidth:       48.1052
Max bandwidth (MB/sec): 160
Min bandwidth (MB/sec): 0
Average Latency:        1.25908
Stddev Latency:         2.62018
Max latency:            21.2809
Min latency:            0.029227

Rados Bench SSD writes
 Total time run:         20.425192
Total writes made:      1405
Write size:             4194304
Bandwidth (MB/sec):     275.150 

Stddev Bandwidth:       122.565
Max bandwidth (MB/sec): 576
Min bandwidth (MB/sec): 0
Average Latency:        0.231803
Stddev Latency:         0.190978
Max latency:            0.981022
Min latency:            0.0265421

As you can see SSD is better but not as much as I would expect SSD to be. 

On Tue, Nov 24, 2015 at 9:10 AM, Alan Johnson <alanj@xxxxxxxxxxxxxx> wrote:
Hard to know without more config details such as no of servers, network  – GigE or !0 GigE, also not sure how you are measuring, (reads or writes) you could try RADOS bench as a baseline, I would expect more performance with 7 X 10K spinners journaled to SSDs. The fact that SSDs did not perform much better may mean to a bottleneck elsewhere – network perhaps?
From: Marek Dohojda [mailto:mdohojda@xxxxxxxxxxxxxxxxxxx] 
Sent: Tuesday, November 24, 2015 10:37 AM
To: Alan Johnson
Cc: Haomai Wang; ceph-users@xxxxxxxxxxxxxx

Subject: Re: [ceph-users] Performance question

Yeah they are, that is one thing I was planning on changing, What I am really interested at the moment, is vague expected performance.  I mean is 100MB around normal, very low, or "could be better"?

On Tue, Nov 24, 2015 at 8:02 AM, Alan Johnson <alanj@xxxxxxxxxxxxxx> wrote:
Are the journals on the same device – it might be better to use the SSDs for journaling since you are not getting better performance with SSDs?

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Marek Dohojda
Sent: Monday, November 23, 2015 10:24 PM
To: Haomai Wang
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Performance question

 Sorry I should have specified SAS is the 100 MB :) , but to be honest SSD isn't much faster.

On Mon, Nov 23, 2015 at 7:38 PM, Haomai Wang <haomaiwang@xxxxxxxxx> wrote:
On Tue, Nov 24, 2015 at 10:35 AM, Marek Dohojda
<mdohojda@xxxxxxxxxxxxxxxxxxx> wrote:
> No SSD and SAS are in two separate pools.
>
> On Mon, Nov 23, 2015 at 7:30 PM, Haomai Wang <haomaiwang@xxxxxxxxx> wrote:
>>
>> On Tue, Nov 24, 2015 at 10:23 AM, Marek Dohojda
>> <mdohojda@xxxxxxxxxxxxxxxxxxx> wrote:
>> > I have a Hammer Ceph cluster on 7 nodes with total 14 OSDs.  7 of which
>> > are
>> > SSD and 7 of which are SAS 10K drives.  I get typically about 100MB IO
>> > rates
>> > on this cluster.

So which pool you get with 100 MB?

>>
>> You mixed up sas and ssd in one pool?
>>
>> >
>> > I have a simple question.  Is 100MB within my configuration what I
>> > should
>> > expect, or should it be higher? I am not sure if I should be looking for
>> > issues, or just accept what I have.
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://xo4t.mj.am/link/xo4t/rslwlms/1/BMAuqvTZa9PuDgefDPxnDw/aHR0cDovL3hvNHQubWouYW0vbGluay94bzR0L3JzeGppdDEvMS9ObEVxaHVhMnJPSHhtWGRpT0NMX3dBL2FIUjBjRG92TDJ4cGMzUnpMbU5sY0dndVkyOXRMMnhwYzNScGJtWnZMbU5uYVM5alpYQm9MWFZ6WlhKekxXTmxjR2d1WTI5dA
>> >
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>
>
--
Best Regards,

Wheat

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com