On 08/03/2018 01:45 PM, Pawel S wrote: > hello! > > We did maintenance works (cluster shrinking) on one cluster (jewel) > and after shutting one of osds down we found this situation where > recover of pg can't be started because of "querying" one of peers. We > restarted this OSD, tried to out and in. Nothing helped, finally we > moved out data (the pg was still on it) and removed this osd from > crush and whole cluster. But recover can't start on any other osd to > create this copy again. We still have valid active 2 copies, but we > would like to have it clean. > How we can push recover to have this third copy somewhere ? > Replication size is 3 on hosts and there are plenty of them. > > Status now: > health HEALTH_WARN > 1 pgs degraded > 1 pgs stuck degraded > 1 pgs stuck unclean > 1 pgs stuck undersized > 1 pgs undersized > recovery 268/19265130 objects degraded (0.001%) > > Link to PG query details, health status and version commit here: > https://gist.github.com/pejotes/aea71ecd2718dbb3ceab0e648924d06b Can you add 'ceph osd tree', 'ceph osd crush show-tunables' and 'ceph osd crush rule dump'? Looks like crush is not able to find place for 3rd copy due to big difference in weight of rack/host depending on your crush rules. -- PS _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com