Hi, ceph 0.94.5 After restarting one of our three osd hosts to increase the RAM and change from linux 3.18.21 to 4.1., the cluster is stuck with all pgs peering: # ceph -s cluster c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 health HEALTH_WARN 3072 pgs peering 3072 pgs stuck inactive 3072 pgs stuck unclean 1450 requests are blocked > 32 sec noout flag(s) set monmap e9: 3 mons at {b2=10.200.63.130:6789/0,b4=10.200.63.132:6789/0,b5=10.200.63.133:6789/0} election epoch 74462, quorum 0,1,2 b2,b4,b5 osdmap e356963: 59 osds: 59 up, 59 in flags noout pgmap v69385733: 3072 pgs, 3 pools, 11973 GB data, 3340 kobjects 31768 GB used, 102 TB / 133 TB avail 3072 peering What can I do to diagnose (or better yet, fix!) this? Downgrading back to 3.18.21 hasn't helped. Each host (now) has 192G RAM. One has 17 osds, the other two have 21 osds each. I can see there's traffic going between the osd ports on the various osd hosts, but all small packets (122 or 131 bytes). Just prior to upgrading this osd host another one had also been upgraded (RAM + linux). The cluster had no trouble at that point and was healthy within a few minutes of that server starting up. The cluster has been working fine for years up to now, having had rolling upgrades since dumpling. Cheers, Chris _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com