I ran a ceph osd reweight-by-utilization yesterday and partway through had a network interruption. After the network was restored the cluster continued to rebalance but this morning the cluster has stopped rebalance and status will not change from: # ceph status cluster af859ff1-c394-4c9a-95e2-0e0e4c87445c health HEALTH_WARN 1 pgs degraded 1 pgs stuck degraded 2 pgs stuck unclean 1 pgs stuck undersized 1 pgs undersized recovery 8163/66089054 objects degraded (0.012%) recovery 8194/66089054 objects misplaced (0.012%) monmap e24: 3 mons at {mon1=10.0.231.53:6789/0,mon2=10.0.231.54:6789/0,mon3=10.0.231.55:6789/0} election epoch 250, quorum 0,1,2 mon1,mon2,mon3 osdmap e184486: 100 osds: 100 up, 100 in; 1 remapped pgs pgmap v3010985: 4144 pgs, 7 pools, 125 TB data, 32270 kobjects 251 TB used, 111 TB / 363 TB avail 8163/66089054 objects degraded (0.012%) 8194/66089054 objects misplaced (0.012%) 4142 active+clean 1 active+undersized+degraded 1 active+remapped # ceph health detail HEALTH_WARN 1 pgs degraded; 1 pgs stuck degraded; 2 pgs stuck unclean; 1 pgs stuck undersized; 1 pgs undersized; recovery 8163/66089054 objects degraded (0.012%); recovery 8194/66089054 objects misplaced (0.012%) pg 2.e7f is stuck unclean for 65125.554509, current state active+remapped, last acting [58,5] pg 2.782 is stuck unclean for 65140.681540, current state active+undersized+degraded, last acting [76] pg 2.782 is stuck undersized for 60568.221461, current state active+undersized+degraded, last acting [76] pg 2.782 is stuck degraded for 60568.221549, current state active+undersized+degraded, last acting [76] pg 2.782 is active+undersized+degraded, acting [76] recovery 8163/66089054 objects degraded (0.012%) recovery 8194/66089054 objects misplaced (0.012%) # ceph pg 2.e7f query "recovery_state": [ { "name": "Started\/Primary\/Active", "enter_time": "2015-08-11 15:43:09.190269", "might_have_unfound": [], "recovery_progress": { "backfill_targets": [], "waiting_on_backfill": [], "last_backfill_started": "0\/\/0\/\/-1", "backfill_info": { "begin": "0\/\/0\/\/-1", "end": "0\/\/0\/\/-1", "objects": [] }, "peer_backfill_info": [], "backfills_in_flight": [], "recovering": [], "pg_backend": { "pull_from_peer": [], "pushing": [] } }, "scrub": { "scrubber.epoch_start": "0", "scrubber.active": 0, "scrubber.waiting_on": 0, "scrubber.waiting_on_whom": [] } }, { "name": "Started", "enter_time": "2015-08-11 15:43:04.955796" } ], # ceph pg 2.782 query "recovery_state": [ { "name": "Started\/Primary\/Active", "enter_time": "2015-08-11 15:42:42.178042", "might_have_unfound": [ { "osd": "5", "status": "not queried" } ], "recovery_progress": { "backfill_targets": [], "waiting_on_backfill": [], "last_backfill_started": "0\/\/0\/\/-1", "backfill_info": { "begin": "0\/\/0\/\/-1", "end": "0\/\/0\/\/-1", "objects": [] }, "peer_backfill_info": [], "backfills_in_flight": [], "recovering": [], "pg_backend": { "pull_from_peer": [], "pushing": [] } }, "scrub": { "scrubber.epoch_start": "0", "scrubber.active": 0, "scrubber.waiting_on": 0, "scrubber.waiting_on_whom": [] } }, { "name": "Started", "enter_time": "2015-08-11 15:42:41.139709" } ], "agent_state": {} I tried restarted osd.5/58/76 but no change. Any suggestions? _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com