16 osds: 11 up, 16 in

clewis@xxxxxxxxxxxxxxxxxx (Craig Lewis) · Wed, 07 May 2014 13:09:46 -0700

I already have osd_max_backfill = 1, and osd_recovery_op_priority = 1.

osd_recovery_max_active is the default 15, so I'll give that a try...  
some OSDs timed out during the injectargs.  I added it to ceph.conf, and 
restarted them all.

I was running RadosGW-Agent, but it's down now.  I disabled scrub and 
deep-scrub as well.  All the Disk I/O is dedicated to recovery now.

15 minutes after the restart:
2014-05-07 13:03:19.249179 mon.0 [INF] osd.5 marked down after no pg 
stats for 901.601323seconds

One of the OSDs (osd.5) didn't complete the peering process.  It's like 
the OSD locked up immediately after restart.  It looks like it too.  As 
soon as osd.5 started peering, it went to exactly 100% CPU, and other 
OSDs start complaining that it wasn't responding to subops.

On 5/7/14 11:45 , Mike Dawson wrote:
> Craig,
>
> I suspect the disks in question are seeking constantly and the spindle 
> contention is causing significant latency. A strategy of throttling 
> backfill/recovery and reducing client traffic tends to work for me.
>
> 1) You should make sure recovery and backfill are throttled:
> ceph tell osd.* injectargs '--osd_max_backfills 1'
> ceph tell osd.* injectargs '--osd_recovery_max_active 1'
> ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
>
> 2) We run a not-particularly critical service with a constant stream 
> of 95% write/5% read small, random IO. During recovery/backfill, we 
> are heavily bound by IOPS. It often times feels like a net win to 
> throttle unessential client traffic in an effort to get spindle 
> contention under control if Step 1 wasn't enough.
>
> If that all fails, you can try "ceph osd set nodown" which will 
> prevent OSDs from being marked down (with or without proper cause), 
> but that tends to cause me more trouble than its worth.
>
> Thanks,
>
> Mike Dawson
> Co-Founder & Director of Cloud Architecture
> Cloudapt LLC
> 6330 East 75th Street, Suite 170
> Indianapolis, IN 46250
>
> On 5/7/2014 1:28 PM, Craig Lewis wrote:
>> The 5 OSDs that are down have all been kicked out for being
>> unresponsive.  The 5 OSDs are getting kicked faster than they can
>> complete the recovery+backfill.  The number of degraded PGs is growing
>> over time.
>>
>> root at ceph0c:~# ceph -w
>>      cluster 1604ec7a-6ceb-42fc-8c68-0a7896c4e120
>>       health HEALTH_WARN 49 pgs backfill; 926 pgs degraded; 252 pgs
>> down; 30 pgs incomplete; 291 pgs peering; 1 pgs recovery_wait; 175 pgs
>> stale; 255 pgs stuck inactive; 175 pgs stuck stale; 1234 pgs stuck
>> unclean; 66 requests are blocked > 32 sec; recovery 6820014/38055556
>> objects degraded (17.921%); 4/16 in osds are down; noout flag(s) set
>>       monmap e2: 2 mons at
>> {ceph0c=10.193.0.6:6789/0,ceph1c=10.193.0.7:6789/0}, election epoch 238,
>> quorum 0,1 ceph0c,ceph1c
>>       osdmap e38673: 16 osds: 12 up, 16 in
>>              flags noout
>>        pgmap v7325233: 2560 pgs, 17 pools, 14090 GB data, 18581 kobjects
>>              28456 GB used, 31132 GB / 59588 GB avail
>>              6820014/38055556 objects degraded (17.921%)
>>                     1 stale+active+clean+scrubbing+deep
>>                    15 active
>>                  1247 active+clean
>>                     1 active+recovery_wait
>>                    45 stale+active+clean
>>                    39 peering
>>                    29 stale+active+degraded+wait_backfill
>>                   252 down+peering
>>                   827 active+degraded
>>                    50 stale+active+degraded
>>                    20 stale+active+degraded+remapped+wait_backfill
>>                    30 stale+incomplete
>>                     4 active+clean+scrubbing+deep
>>
>> Here's a snippet of ceph.log for one of these OSDs:
>> 2014-05-07 09:22:46.747036 mon.0 10.193.0.6:6789/0 39981 : [INF] osd.3
>> marked down after no pg stats for 901.212859seconds
>> 2014-05-07 09:47:17.930251 mon.0 10.193.0.6:6789/0 40561 : [INF] osd.3
>> 10.193.0.6:6812/2830 boot
>> 2014-05-07 09:47:16.914519 osd.3 10.193.0.6:6812/2830 823 : [WRN] map
>> e38649 wrongly marked me down
>>
>> root at ceph0c:~# uname -a
>> Linux ceph0c 3.5.0-46-generic #70~precise1-Ubuntu SMP Thu Jan 9 23:55:12
>> UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
>> root at ceph0c:~# lsb_release -a
>> No LSB modules are available.
>> Distributor ID:    Ubuntu
>> Description:    Ubuntu 12.04.4 LTS
>> Release:    12.04
>> Codename:    precise
>> root at ceph0c:~# ceph -v
>> ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
>>
>>
>> Any ideas what I can do to make these OSDs stop drying after 15 minutes?
>>
>>
>>
>>
>> -- 
>>
>> *Craig Lewis*
>> Senior Systems Engineer
>> Office +1.714.602.1309
>> Email clewis at centraldesktop.com <mailto:clewis at centraldesktop.com>
>>
>> *Central Desktop. Work together in ways you never thought possible.*
>> Connect with us Website <http://www.centraldesktop.com/> | Twitter
>> <http://www.twitter.com/centraldesktop>  | Facebook
>> <http://www.facebook.com/CentralDesktop>  | LinkedIn
>> <http://www.linkedin.com/groups?gid=147417>  | Blog
>> <http://cdblog.centraldesktop.com/>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>

-- 

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email clewis at centraldesktop.com <mailto:clewis at centraldesktop.com>

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/>  | Twitter 
<http://www.twitter.com/centraldesktop>  | Facebook 
<http://www.facebook.com/CentralDesktop>  | LinkedIn 
<http://www.linkedin.com/groups?gid=147417>  | Blog 
<http://cdblog.centraldesktop.com/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140507/498490eb/attachment.htm>