What To Do If A Cluster Just Won't Get Healthy
Sometimes it seems a cluster just won't reach a healthy state. This is quite obvious when some PGs are stuck in a peering state for a long time. If looking through the logs don't give you ideas, here is a list of things to verify (don't skip any steps because these are often overlooked, but can cause a cluster to be stuck).
Verify firewall settings. The firewall should be either off or tcp ports 6789 and 6800-6899 should allow incoming connections on the public and cluster networks.
iptables -L -n
Verify that the clocks on all hosts are close in time and running ntpd.
date; ps aux | grep ntpd
Verify that each host can communicate using Jumbo frames with each host on the public and private networks
ping -c 4 -s 8970 www.xxx.yyy.zzz
Check if there is some zombie OSD entries in the CRUSH map. If there are a number of DNE entries for OSDs, they haven't been cleaned up from the CRUSH map completely.
If you are unsure if an OSD is completely dead or might come back, you can set the weight of that OSD to zero
ceph osd crush reweight osd.XX 0.000
If you know that the OSD will not ever come back, you can remove it completely
ceph osd crush remove osd.X
Check that you aren't running into an open file limit. Ceph OSDs starting by udev or SystemV/upstart/systemd should handle this pretty well. If processes are being started manually, be sure to check.
ps aux | grep ceph-osd (should have a ulimit command in the process command)
grep "Too many open files" /var/log/ceph/ceph-osd.* (should not return anything)
Hi all,
may there be a problem with the crush function during 'from scratch' installation of 0.94.1-0 ?
This has been tested many times, with ceph-deploy-1.5.22-0 or ceph-deploy-1.5.23-0. Platform RHEL7.
Each time, the new cluster ends up in a weird state never seen on my previous installed versions (0.94, 0.87.1),
- I've seen things perhaps linked to ceph-deploy-1.5.23-0, either one or more monitors being unable to form the cluster (with respawning 'python /usr/sbin/ceph-create-keys' messages). But I think that's other part of the issue.
- the main issue is visible as a warning on health of the PGs as soon as the cluster is enough formed to answer a 'ceph -s'.
- here is a 1 Mon, almost empty freshly installed cluster :
ROOT > ceph -s
cluster e581ab43-d0f5-4ea8-811f-94c8df16d044
health HEALTH_WARN
2 pgs degraded
14 pgs peering
4 pgs stale
2 pgs stuck degraded
25 pgs stuck inactive
4 pgs stuck stale
27 pgs stuck unclean
2 pgs stuck undersized
2 pgs undersized
too few PGs per OSD (3 < min 30)
monmap e1: 1 mons at {helga=10.10.10.64:6789/0}
election epoch 2, quorum 0 helga
osdmap e398: 60 osds: 60 up, 60 in; 2 remapped pgs
pgmap v1553: 64 pgs, 1 pools, 0 bytes data, 0 objects
2829 MB used, 218 TB / 218 TB avail
37 active+clean
12 peering
11 activating
2 stale+active+undersized+degraded
2 stale+remapped+peering
with time, the number of defects is growing. They literraly explode if we put objects on it.
- a 'ceph health detail' show for example entries like this one :
pg 0.22 is stuck inactive since forever, current state peering, last acting [18,17,0]
- A query on the PG shows
ceph pg 0.22 query
{
"state": "peering",
../..
"up": [
18,
17,
0
],
"blocked_by": [
0,
1,
5,
17
],
../..
}
If my understanding of the ceph query is correct, OSDs 1, 5 and 17 have nothing do do with this PG.... Where do they come from ??
Couldn't this be part of the "critical issues with CRUSH" 0.94.1 is meant to correct ?
Frederic
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com