Hi, What is min_size for this pool? Maybe you need to decrease it for cluster to start recovering. Arvydas From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On Behalf Of Scott Laird I lost a few OSDs recently. Now my cell is unhealthy and I can't figure out how to get it healthy again. OSD 3, 7, 10, and 40 died in a power outage. Now I have 10 PGs that are down+incomplete, but all of them seem like they should have surviving replicas of all data. I'm running 9.2.0. $ ceph health detail | grep down pg 18.c1 is down+incomplete, acting [11,18,9] pg 18.47 is down+incomplete, acting [11,9,22] pg 18.1d7 is down+incomplete, acting [5,31,24] pg 18.1d6 is down+incomplete, acting [22,11,5] pg 18.2af is down+incomplete, acting [19,24,18] pg 18.2dd is down+incomplete, acting [15,11,22] pg 18.2de is down+incomplete, acting [15,17,11] pg 18.3e is down+incomplete, acting [25,8,18] pg 18.3d6 is down+incomplete, acting [22,39,24] pg 18.3e6 is down+incomplete, acting [9,23,8] $ ceph pg 18.c1 query { "state": "down+incomplete", "snap_trimq": "[]", "epoch": 960905, "up": [ 11, 18, 9 ], "acting": [ 11, 18, 9 ], "info": { "pgid": "18.c1", "last_update": "0'0", "last_complete": "0'0", "log_tail": "0'0", "last_user_version": 0, "last_backfill": "MAX", "last_backfill_bitwise": 0, "purged_snaps": "[]", "history": { "epoch_created": 595523, "last_epoch_started": 954170, "last_epoch_clean": 954170, "last_epoch_split": 0, "last_epoch_marked_full": 0, "same_up_since": 959988, "same_interval_since": 959988, "same_primary_since": 959988, "last_scrub": "613947'7736", "last_scrub_stamp": "2015-11-11 21:18:35.118057", "last_deep_scrub": "613947'7736", "last_deep_scrub_stamp": "2015-11-11 21:18:35.118057", "last_clean_scrub_stamp": "2015-11-11 21:18:35.118057" }, ... "probing_osds": [ "9", "11", "18", "23", "25" ], "down_osds_we_would_probe": [ 7, 10 ], "peering_blocked_by": [] }, { "name": "Started", "enter_time": "2016-02-09 20:35:57.627376" } ], "agent_state": {} } I tried replacing disks. I created a new OSD 3 and 7 but neither will start up; the ceph-osd task starts but never actually makes it to 'up' with nothing obvious in the logs. I can post logs if that helps. Since the OSDs were removed
a few days ago, 'ceph osd lost' doesn't seem to help. Is there a way to fix these PGs and get my cluster healthy again? Scott |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com