On Tue, Dec 12, 2017 at 12:33 PM Nick Fisk <nick@xxxxxxxxxx> wrote:
Did that fix anything? I don't see anything immediately obvious but I'm not practiced in quickly reading that pg state output. What's the output of "ceph -s"? Hi Greg, No restarting OSD’s didn’t seem to help. But I did make some progress late last night. By stopping OSD.68 the cluster unlocks itself and IO can progress. However as soon as it starts back up, 0.1cf and a couple of other PG’s again get stuck in an activating state. If I out the OSD, either with it up or down, then some other PG’s seem to get hit by the same problem as CRUSH moves PG mappings around to other OSD’s. So there definitely seems to be some sort of weird peering issue somewhere. I have seen a very similar issue before on this cluster where after running the crush reweight script to balance OSD utilization, the weight got set too low and PG’s were unable to peer. I’m not convinced this is what’s happening here as all the weights haven’t changed, but I’m intending to explore this further just in case. With 68 down pgs: 1071783/48650631 objects degraded (2.203%) 5923 active+clean 399 active+undersized+degraded 7 active+clean+scrubbing+deep 7 active+clean+remapped With it up pgs: 0.047% pgs not active 67271/48651279 objects degraded (0.138%) 15602/48651279 objects misplaced (0.032%) 6051 active+clean 273 active+recovery_wait+degraded 4 active+clean+scrubbing+deep 4 active+remapped+backfill_wait 3 activating+remapped
PG Dump ceph pg dump | grep activatin dumped all 2.389 0 0 0 0 0 0 1500 1500 activating+remapped 2017-12-13 11:08:50.990526 76271'34230 106239:160310 [68,60,58,59,29,23] 68 [62,60,58,59,29,23] 62 76271'34230 2017-12-13 09:00:08.359690 76271'34230 2017-12-10 10:05:10.931366 0.1cf 3947 0 0 0 0 16472186880 1577 1577 activating+remapped 2017-12-13 11:08:50.641034 106236'7512915 106239:6176548 [34,68,8] 34 [34,8,53] 34 106138'7512682 2017-12-13 10:27:37.400613 106138'7512682 2017-12-13 10:27:37.400613 2.210 0 0 0 0 0 0 1500 1500 activating+remapped 2017-12-13 11:08:50.686193 76271'33304 106239:96797 [68,67,34,36,16,15] 68 [62,67,34,36,16,15] 62 76271'33304 2017-12-12 00:49:21.038437 76271'33304 2017-12-10 16:05:12.751425
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com