Can get the details of 1. ceph health detail 2. ceph pg query <pg-num> of any one PG stuck peering Varada > -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of > Chris Dunlop > Sent: Monday, December 14, 2015 8:22 AM > To: ceph-users@xxxxxxxxxxxxxx > Subject: All pgs stuck peering > > Hi, > > ceph 0.94.5 > > After restarting one of our three osd hosts to increase the RAM and change > from linux 3.18.21 to 4.1., the cluster is stuck with all pgs peering: > > # ceph -s > cluster c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 > health HEALTH_WARN > 3072 pgs peering > 3072 pgs stuck inactive > 3072 pgs stuck unclean > 1450 requests are blocked > 32 sec > noout flag(s) set > monmap e9: 3 mons at > {b2=10.200.63.130:6789/0,b4=10.200.63.132:6789/0,b5=10.200.63.133:6789/0} > election epoch 74462, quorum 0,1,2 b2,b4,b5 > osdmap e356963: 59 osds: 59 up, 59 in > flags noout > pgmap v69385733: 3072 pgs, 3 pools, 11973 GB data, 3340 kobjects > 31768 GB used, 102 TB / 133 TB avail > 3072 peering > > What can I do to diagnose (or better yet, fix!) this? > > Downgrading back to 3.18.21 hasn't helped. > > Each host (now) has 192G RAM. One has 17 osds, the other two have 21 osds > each. > > I can see there's traffic going between the osd ports on the various osd > hosts, but all small packets (122 or 131 bytes). > > Just prior to upgrading this osd host another one had also been upgraded > (RAM + linux). The cluster had no trouble at that point and was healthy > within a few minutes of that server starting up. > > The cluster has been working fine for years up to now, having had rolling > upgrades since dumpling. > > Cheers, > > Chris > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com