I lost a few OSDs recently. Now my cell is unhealthy and I can't figure out how to get it healthy again.
OSD 3, 7, 10, and 40 died in a power outage. Now I have 10 PGs that are down+incomplete, but all of them seem like they should have surviving replicas of all data.
I'm running 9.2.0.
$ ceph health detail | grep down
pg 18.c1 is down+incomplete, acting [11,18,9]
pg 18.47 is down+incomplete, acting [11,9,22]
pg 18.1d7 is down+incomplete, acting [5,31,24]
pg 18.1d6 is down+incomplete, acting [22,11,5]
pg 18.2af is down+incomplete, acting [19,24,18]
pg 18.2dd is down+incomplete, acting [15,11,22]
pg 18.2de is down+incomplete, acting [15,17,11]
pg 18.3e is down+incomplete, acting [25,8,18]
pg 18.3d6 is down+incomplete, acting [22,39,24]
pg 18.3e6 is down+incomplete, acting [9,23,8]
$ ceph pg 18.c1 query
{
"state": "down+incomplete",
"snap_trimq": "[]",
"epoch": 960905,
"up": [
11,
18,
9
],
"acting": [
11,
18,
9
],
"info": {
"pgid": "18.c1",
"last_update": "0'0",
"last_complete": "0'0",
"log_tail": "0'0",
"last_user_version": 0,
"last_backfill": "MAX",
"last_backfill_bitwise": 0,
"purged_snaps": "[]",
"history": {
"epoch_created": 595523,
"last_epoch_started": 954170,
"last_epoch_clean": 954170,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 959988,
"same_interval_since": 959988,
"same_primary_since": 959988,
"last_scrub": "613947'7736",
"last_scrub_stamp": "2015-11-11 21:18:35.118057",
"last_deep_scrub": "613947'7736",
"last_deep_scrub_stamp": "2015-11-11 21:18:35.118057",
"last_clean_scrub_stamp": "2015-11-11 21:18:35.118057"
},
...
"probing_osds": [
"9",
"11",
"18",
"23",
"25"
],
"down_osds_we_would_probe": [
7,
10
],
"peering_blocked_by": []
},
{
"name": "Started",
"enter_time": "2016-02-09 20:35:57.627376"
}
],
"agent_state": {}
}
I tried replacing disks. I created a new OSD 3 and 7 but neither will start up; the ceph-osd task starts but never actually makes it to 'up' with nothing obvious in the logs. I can post logs if that helps. Since the OSDs were removed a few days ago, 'ceph osd lost' doesn't seem to help.
Is there a way to fix these PGs and get my cluster healthy again?
Scott
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com