pg stuck in remapped+peering for a long time

Peter Theobald <pete@xxxxxxxxxxxxxxx> · Sat, 14 Nov 2015 12:12:06 +0000

Hi list,

I have a 3 node ceph cluster with a total of 9 ods (2,3 and 4 with different size drives). I changed the layout (failure domain from per osd to per host and changed min_size) and I now have a few pgs stuck in peering or remapped+peering for a couple of day now.

The hosts are under powered. 2x hp microservers and a single i5 desktop grade machine so not super powerful. The network is fast though (bonded gb ethernet with dedicated switch).

I'm concerned that the remapped+peering pgs are stuck. All the nodes in peering or remapped+peering are stuck inactive and unclean so i'm concerned about data loss. Do I just need to wait for them to fix themselves? I cannot see any mention of unfound objects when I query the remapped pgs so I think i'm ok and just need to be patient. I have 128 pgs across 9 osds so probably have a lot of objects per pg. Total data is about 4TB

Regards

Pete

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com