hi every one, we some how got our cluster really messed up. we had one node down du to system disk failing. while we were working to bring it back we had few osds crashing, they kept crashing, so we stopped them. that would be a story for another thread though. now we have few unfound objects, which we are ok if we lost. but we also have two incomplete pgs. one on a pool size is 2, the other on an erasure coded pool (12+4). and they are stuck, i am ok with losing the data, but can not figure out how to get rid of them. all requests to objects inside of the are blocked, driving the whole cluster to a halt. thanks # ceph health detail | grep incomplete HEALTH_WARN 679 pgs backfill; 2 pgs backfilling; 3141 pgs degraded; 2 pgs incomplete; 2488 pgs recovery_wait; 3141 pgs stuck degraded; 2 pgs stuck inactive; 3171 pgs stuck unclean; 1226 pgs stuck undersized; 1226 pgs undersized; 103 requests are blocked > 32 sec; 2 osds have slow requests; recovery 15237140/686754017 objects degraded (2.219%); recovery 23314256/686754017 objects misplaced (3.395%); recovery 79/102730138 unfound (0.000%); noout,noscrub,nodeep-scrub flag(s) set pg 19.e8d is stuck inactive since forever, current state incomplete, last acting [94,78] pg 108.176 is stuck inactive for 35922.975233, current state remapped+incomplete, last acting [2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,8,231,176,299,175,118,277,293,100,44] pg 108.176 is stuck unclean for 69383.394860, current state remapped+incomplete, last acting [2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,8,231,176,299,175,118,277,293,100,44] pg 19.e8d is stuck unclean since forever, current state incomplete, last acting [94,78] pg 19.e8d is incomplete, acting [94,78] pg 108.176 is remapped+incomplete, acting [2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,8,231,176,299,175,118,277,293,100,44] (reducing pool .rgw.buckets.erasure min_size from 12 may help; search ceph.com/docs for 'incomplete') # ceph health detail | grep unfound HEALTH_WARN 679 pgs backfill; 2 pgs backfilling; 3141 pgs degraded; 2 pgs incomplete; 2488 pgs recovery_wait; 3141 pgs stuck degraded; 2 pgs stuck inactive; 3171 pgs stuck unclean; 1226 pgs stuck undersized; 1226 pgs undersized; 103 requests are blocked > 32 sec; 2 osds have slow requests; recovery 15240378/686758679 objects degraded (2.219%); recovery 23313887/686758679 objects misplaced (3.395%); recovery 79/102730392 unfound (0.000%); noout,noscrub,nodeep-scrub flag(s) set pg 4.5d3 is active+recovery_wait+undersized+degraded+remapped, acting [208], 15 unfound pg 19.5c4 is active+recovery_wait+undersized+degraded+remapped, acting [208], 15 unfound pg 4.4a7 is active+recovery_wait+undersized+degraded+remapped, acting [201], 13 unfound pg 19.498 is active+recovery_wait+undersized+degraded+remapped, acting [201], 13 unfound pg 4.1d0 is active+recovery_wait+undersized+degraded+remapped, acting [208], 13 unfound pg 19.1c1 is active+recovery_wait+undersized+degraded+remapped, acting [208], 10 unfound recovery 79/102730392 unfound (0.000%) |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com