The 5 OSDs that are down have all been kicked out for being unresponsive. The 5 OSDs are getting kicked faster than they can complete the recovery+backfill. The number of degraded PGs is growing over time. root at ceph0c:~# ceph -w cluster 1604ec7a-6ceb-42fc-8c68-0a7896c4e120 health HEALTH_WARN 49 pgs backfill; 926 pgs degraded; 252 pgs down; 30 pgs incomplete; 291 pgs peering; 1 pgs recovery_wait; 175 pgs stale; 255 pgs stuck inactive; 175 pgs stuck stale; 1234 pgs stuck unclean; 66 requests are blocked > 32 sec; recovery 6820014/38055556 objects degraded (17.921%); 4/16 in osds are down; noout flag(s) set monmap e2: 2 mons at {ceph0c=10.193.0.6:6789/0,ceph1c=10.193.0.7:6789/0}, election epoch 238, quorum 0,1 ceph0c,ceph1c osdmap e38673: 16 osds: 12 up, 16 in flags noout pgmap v7325233: 2560 pgs, 17 pools, 14090 GB data, 18581 kobjects 28456 GB used, 31132 GB / 59588 GB avail 6820014/38055556 objects degraded (17.921%) 1 stale+active+clean+scrubbing+deep 15 active 1247 active+clean 1 active+recovery_wait 45 stale+active+clean 39 peering 29 stale+active+degraded+wait_backfill 252 down+peering 827 active+degraded 50 stale+active+degraded 20 stale+active+degraded+remapped+wait_backfill 30 stale+incomplete 4 active+clean+scrubbing+deep Here's a snippet of ceph.log for one of these OSDs: 2014-05-07 09:22:46.747036 mon.0 10.193.0.6:6789/0 39981 : [INF] osd.3 marked down after no pg stats for 901.212859seconds 2014-05-07 09:47:17.930251 mon.0 10.193.0.6:6789/0 40561 : [INF] osd.3 10.193.0.6:6812/2830 boot 2014-05-07 09:47:16.914519 osd.3 10.193.0.6:6812/2830 823 : [WRN] map e38649 wrongly marked me down root at ceph0c:~# uname -a Linux ceph0c 3.5.0-46-generic #70~precise1-Ubuntu SMP Thu Jan 9 23:55:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux root at ceph0c:~# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 12.04.4 LTS Release: 12.04 Codename: precise root at ceph0c:~# ceph -v ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60) Any ideas what I can do to make these OSDs stop drying after 15 minutes? -- *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email clewis at centraldesktop.com <mailto:clewis at centraldesktop.com> *Central Desktop. Work together in ways you never thought possible.* Connect with us Website <http://www.centraldesktop.com/> | Twitter <http://www.twitter.com/centraldesktop> | Facebook <http://www.facebook.com/CentralDesktop> | LinkedIn <http://www.linkedin.com/groups?gid=147417> | Blog <http://cdblog.centraldesktop.com/> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140507/5cd10e6f/attachment.htm>