Cluster health_warn 1 active+undersized+degraded/1 active+remapped

Steve Dainard <sdainard@xxxxxxxx> · Wed, 12 Aug 2015 08:48:36 -0700

I ran a ceph osd reweight-by-utilization yesterday and partway through
had a network interruption. After the network was restored the cluster
continued to rebalance but this morning the cluster has stopped
rebalance and status will not change from:

# ceph status
    cluster af859ff1-c394-4c9a-95e2-0e0e4c87445c
     health HEALTH_WARN
            1 pgs degraded
            1 pgs stuck degraded
            2 pgs stuck unclean
            1 pgs stuck undersized
            1 pgs undersized
            recovery 8163/66089054 objects degraded (0.012%)
            recovery 8194/66089054 objects misplaced (0.012%)
     monmap e24: 3 mons at
{mon1=10.0.231.53:6789/0,mon2=10.0.231.54:6789/0,mon3=10.0.231.55:6789/0}
            election epoch 250, quorum 0,1,2 mon1,mon2,mon3
     osdmap e184486: 100 osds: 100 up, 100 in; 1 remapped pgs
      pgmap v3010985: 4144 pgs, 7 pools, 125 TB data, 32270 kobjects
            251 TB used, 111 TB / 363 TB avail
            8163/66089054 objects degraded (0.012%)
            8194/66089054 objects misplaced (0.012%)
                4142 active+clean
                   1 active+undersized+degraded
                   1 active+remapped

# ceph health detail
HEALTH_WARN 1 pgs degraded; 1 pgs stuck degraded; 2 pgs stuck unclean;
1 pgs stuck undersized; 1 pgs undersized; recovery 8163/66089054
objects degraded (0.012%); recovery 8194/66089054 objects misplaced
(0.012%)
pg 2.e7f is stuck unclean for 65125.554509, current state
active+remapped, last acting [58,5]
pg 2.782 is stuck unclean for 65140.681540, current state
active+undersized+degraded, last acting [76]
pg 2.782 is stuck undersized for 60568.221461, current state
active+undersized+degraded, last acting [76]
pg 2.782 is stuck degraded for 60568.221549, current state
active+undersized+degraded, last acting [76]
pg 2.782 is active+undersized+degraded, acting [76]
recovery 8163/66089054 objects degraded (0.012%)
recovery 8194/66089054 objects misplaced (0.012%)

# ceph pg 2.e7f query
    "recovery_state": [
        {
            "name": "Started\/Primary\/Active",
            "enter_time": "2015-08-11 15:43:09.190269",
            "might_have_unfound": [],
            "recovery_progress": {
                "backfill_targets": [],
                "waiting_on_backfill": [],
                "last_backfill_started": "0\/\/0\/\/-1",
                "backfill_info": {
                    "begin": "0\/\/0\/\/-1",
                    "end": "0\/\/0\/\/-1",
                    "objects": []
                },
                "peer_backfill_info": [],
                "backfills_in_flight": [],
                "recovering": [],
                "pg_backend": {
                    "pull_from_peer": [],
                    "pushing": []
                }
            },
            "scrub": {
                "scrubber.epoch_start": "0",
                "scrubber.active": 0,
                "scrubber.waiting_on": 0,
                "scrubber.waiting_on_whom": []
            }
        },
        {
            "name": "Started",
            "enter_time": "2015-08-11 15:43:04.955796"
        }
    ],

# ceph pg 2.782 query
  "recovery_state": [
        {
            "name": "Started\/Primary\/Active",
            "enter_time": "2015-08-11 15:42:42.178042",
            "might_have_unfound": [
                {
                    "osd": "5",
                    "status": "not queried"
                }
            ],
            "recovery_progress": {
                "backfill_targets": [],
                "waiting_on_backfill": [],
                "last_backfill_started": "0\/\/0\/\/-1",
                "backfill_info": {
                    "begin": "0\/\/0\/\/-1",
                    "end": "0\/\/0\/\/-1",
                    "objects": []
                },
                "peer_backfill_info": [],
                "backfills_in_flight": [],
                "recovering": [],
                "pg_backend": {
                    "pull_from_peer": [],
                    "pushing": []
                }
            },
            "scrub": {
                "scrubber.epoch_start": "0",
                "scrubber.active": 0,
                "scrubber.waiting_on": 0,
                "scrubber.waiting_on_whom": []
            }
        },
        {
            "name": "Started",
            "enter_time": "2015-08-11 15:42:41.139709"
        }
    ],
    "agent_state": {}

I tried restarted osd.5/58/76 but no change.

Any suggestions?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com