Hi Everyone, Just an update to this in case anyone has the same issue. This seems to have been caused by ceph osd reweight-by-utilization. Because we have two pools that map to two separate sets of disks and one pool was more full than the other the reweight-by-utilization had reduced the weight of the osd's in one pool down to around 0.3. This seems to have caused the crush map not to be able to find a suitable osd for the 2nd copy. Changing the reweight weights back up to near 1 has resolved the issue. Regards, Richard On 25 January 2017 at 10:58, Richard Bade <hitrich@xxxxxxxxx> wrote: > Hi Everyone, > I've got a strange one. After doing a reweight of some osd's the other > night our cluster is showing 1pg stuck unclean. > > 2017-01-25 09:48:41 : 1 pgs stuck unclean | recovery 140/71532872 > objects degraded (0.000%) | recovery 2553/71532872 objects misplaced > (0.004%) > > When I query the pg it shows one of the osd's is not up. > > "state": "active+remapped", > "snap_trimq": "[]", > "epoch": 231928, > "up": [ > 155 > ], > "acting": [ > 155, > 105 > ], > "actingbackfill": [ > "105", > "155" > ], > > I've tried restarting the osd's, ceph pg repair, ceph pg 4.559 > list_missing, ceph pg 4.559 mark_unfound_lost revert. > Nothing works. > I've just tried setting osd.105 out, waiting for backfill to evacuate > the osd and stopping the osd process to see if it'll recreate the 2nd > set of data but no luck. > It would seem that the primary copy of the data on osd.155 is fine but > the 2nd copy on osd.105 isn't there. > > Any ideas how I can force rebuilding the 2nd copy? Or any other ideas > to resolve this? > > We're running Hammer > ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90) > > Regards, > Richard _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com