ceph osd crush tunables optimal AND add new OSD at the same time

andrei@xxxxxxxxxx (Andrei Mikhailovsky) · Thu, 17 Jul 2014 17:02:45 +0100 (BST)

Comments inline 

----- Original Message -----

From: "Sage Weil" <sweil@xxxxxxxxxx> 
To: "Quenten Grasso" <qgrasso at onq.com.au> 
Cc: ceph-users at lists.ceph.com 
Sent: Thursday, 17 July, 2014 4:44:45 PM 
Subject: Re: ceph osd crush tunables optimal AND add new OSD at the same time 

On Thu, 17 Jul 2014, Quenten Grasso wrote: 

> Hi Sage & List 
> 
> I understand this is probably a hard question to answer. 
> 
> I mentioned previously our cluster is co-located MON?s on OSD servers, which 
> are R515?s w/ 1 x AMD 6 Core processor & 11 3TB OSD?s w/ dual 10GBE. 
> 
> When our cluster is doing these busy operations and IO has stopped as in my 
> case, I mentioned earlier running/setting tuneable to optimal or heavy 
> recovery 
> 
> operations is there a way to ensure our IO doesn?t get completely 
> blocked/stopped/frozen in our vms? 
> 
> Could it be as simple as putting all 3 of our mon servers on baremetal 
> w/ssd?s? (I recall reading somewhere that a mon disk was doing several 
> thousand IOPS during a recovery operation) 
> 
> I assume putting just one on baremetal won?t help because our mon?s will only 
> ever be as fast as our slowest mon server? 

I don't think this is related to where the mons are (most likely). The 
big question for me is whether IO is getting completely blocked, or just 
slowed enough that the VMs are all timing out. 

AM: I was looking at the cluster status while the rebalancing was taking place and I was seeing very little client IO reported by ceph -s output. The numbers were around 20-100 whereas our typical IO for the cluster is around 1000. Having said that, this was not enough as _all_ of our vms become unresponsive and didn't recover after rebalancing finished. 

What slow request messages 
did you see during the rebalance? 

AM: As I was experimenting with different options while trying to gain some client IO back i've noticed that when I am limiting the options to 1 per osd ( osd max backfills = 1, osd recovery max active = 1, osd recovery threads = 1), I did not have any slow or blocked requests at all. Increasing these values did produce some blocked requests occasionally, but they were being quickly cleared. 

What were the op latencies? 

AM: In general, the latencies were around 5-10 higher compared to the normal cluster ops. The second column of the "ceph osd perf" was around 50s where as it is typically between 3-10. It did occasionally jump to some crazy numbers like 2000-3000 on several osds, but only for 5-10 seconds. 

It's 
possible there is a bug here, but it's also possible the cluster is just 
operating close enough to capacity that the additional rebalancing work 
pushes it into a place where it can't keep up and the IO latencies are 
too high. 

AM: My cluster in particular is under-utilised for the majority of time. I do not typically see osds more than 20-30% utilised and our ssd journals are usually less than 10% utilised. 

Or that we just have more work to do prioritizing requests.. 
but it's hard to say without more info. 

sage 
_______________________________________________ 
ceph-users mailing list 
ceph-users at lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140717/8817657b/attachment.htm>