Re-packaging this question which was buried in a larger, less-specific thread from a couple of days ago. Hoping this will be more useful here. We have been working on restoring our Ceph cluster after losing a large number of OSDs. We have all PGs active now except for 80 PGs that are stuck in the "incomplete" state. These PGs are referencing OSD.8 which we removed 2 weeks ago due to corruption. We would like to abandon the "incomplete" PGs as they are not restorable. We have tried the following:
How do we abandon these PGs to allow recovery to continue? Is there some way to force individual PGs to be marked as "lost"? ==== Some miscellaneous data below: djakubiec@dev:~$ ceph osd lost 8 --yes-i-really-mean-it osd.8 is not down or doesn't exist djakubiec@dev:~$ ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 58.19960 root default -2 7.27489 host node24 1 7.27489 osd.1 up 1.00000 1.00000 -3 7.27489 host node25 2 7.27489 osd.2 up 1.00000 1.00000 -4 7.27489 host node26 3 7.27489 osd.3 up 1.00000 1.00000 -5 7.27489 host node27 4 7.27489 osd.4 up 1.00000 1.00000 -6 7.27489 host node28 5 7.27489 osd.5 up 1.00000 1.00000 -7 7.27489 host node29 6 7.27489 osd.6 up 1.00000 1.00000 -8 7.27539 host node30 9 7.27539 osd.9 up 1.00000 1.00000 -9 7.27489 host node31 7 7.27489 osd.7 up 1.00000 1.00000 BUT, even though OSD 8 no longer exists I see still lots of references to OSD 8 in various ceph dumps and query's. Interestingly, we do still see weird entries in the CRUSH map (should I do something about these?): # devices device 0 device0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 device 5 osd.5 device 6 osd.6 device 7 osd.7 device 8 device8 device 9 osd.9 And for what it is worth, here is the ceph -s: cluster 10d47013-8c2a-40c1-9b4a-214770414234 health HEALTH_ERR 212 pgs are stuck inactive for more than 300 seconds 93 pgs backfill_wait 1 pgs backfilling 101 pgs degraded 63 pgs down 80 pgs incomplete 89 pgs inconsistent 4 pgs recovery_wait 1 pgs repair 132 pgs stale 80 pgs stuck inactive 132 pgs stuck stale 103 pgs stuck unclean 97 pgs undersized 2 requests are blocked > 32 sec recovery 4394354/46343776 objects degraded (9.482%) recovery 4025310/46343776 objects misplaced (8.686%) 2157 scrub errors mds cluster is degraded monmap e1: 3 mons at {core=10.0.1.249:6789/0,db=10.0.1.251:6789/0,dev=10.0.1.250:6789/0} election epoch 266, quorum 0,1,2 core,dev,db fsmap e3627: 1/1/1 up {0=core=up:replay} osdmap e4293: 8 osds: 8 up, 8 in; 144 remapped pgs flags sortbitwise pgmap v1866639: 744 pgs, 10 pools, 7668 GB data, 20673 kobjects 8339 GB used, 51257 GB / 59596 GB avail 4394354/46343776 objects degraded (9.482%) 4025310/46343776 objects misplaced (8.686%) 362 active+clean 112 stale+active+clean 89 active+undersized+degraded+remapped+wait_backfill 66 active+clean+inconsistent 63 down+incomplete 19 stale+active+clean+inconsistent 17 incomplete 5 active+undersized+degraded+remapped 4 active+recovery_wait+degraded 2 active+undersized+degraded+remapped+inconsistent+wait_backfill 1 stale+active+clean+scrubbing+deep+inconsistent+repair 1 active+remapped+inconsistent+wait_backfill 1 active+clean+scrubbing+deep 1 active+remapped+wait_backfill 1 active+undersized+degraded+remapped+backfilling Thanks, -- Dan |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com