Re: Still CRUSH problems with 0.94.1 ?

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Tue, 21 Apr 2015 09:42:16 -0600

We had a very similar problem, but was repeatable on Firefly as well. For us, it turns out the MTU on the switches were not all configured for 9000 byte frames. This prevented the peering process to complete and as data was added, and such got worse.
Here is a section I wrote for our internal documentation (I'm going to try and get this into the official documentation when I have some time).

What To Do If A Cluster Just Won't Get Healthy
	Sometimes it seems a cluster just won't reach a healthy state. This is quite obvious when some PGs are stuck in a peering state for a long time. If looking through the logs don't give you ideas, here is a list of things to verify (don't skip any steps because these are often overlooked, but can cause a cluster to be stuck).
Verify firewall settings. The firewall should be either off or tcp ports 6789 and 6800-6899 should allow incoming connections on the public and cluster networks. 
iptables -L -n
Verify that the clocks on all hosts are close in time and running ntpd.
date; ps aux | grep ntpd
Verify that each host can communicate using Jumbo frames with each host on the public and private networks
 ping -c 4 -s 8970 www.xxx.yyy.zzz
Check if there is some zombie OSD entries in the CRUSH map. If there are a number of DNE entries for OSDs, they haven't been cleaned up from the CRUSH map completely.
If you are unsure if an OSD is completely dead or might come back, you can set the weight of that OSD to zero
ceph osd crush reweight osd.XX 0.000
If you know that the OSD will not ever come back, you can remove it completely
ceph osd crush remove osd.X
Check that you aren't running into an open file limit. Ceph OSDs starting by udev or SystemV/upstart/systemd should handle this pretty well. If processes are being started manually, be sure to check.
ps aux | grep ceph-osd (should have a ulimit command in the process command)
grep "Too many open files" /var/log/ceph/ceph-osd.* (should not return anything)

On Tue, Apr 21, 2015 at 7:03 AM, fred@xxxxxxxxxx <fred@xxxxxxxxxx> wrote:
Hi all,

may there be a problem with the crush function during 'from scratch' installation of 0.94.1-0 ?

This has been tested many times, with ceph-deploy-1.5.22-0 or ceph-deploy-1.5.23-0. Platform RHEL7.

Each time, the new cluster ends up in a weird state never seen on my previous installed versions (0.94, 0.87.1),

- I've seen things perhaps linked to ceph-deploy-1.5.23-0, either one or more monitors being unable to form the cluster (with respawning 'python /usr/sbin/ceph-create-keys' messages). But I think that's other part of the issue.

- the main issue is visible as a warning on health of the PGs as soon as the cluster is enough formed to answer a 'ceph -s'.

- here is a 1 Mon, almost empty freshly installed cluster :

ROOT > ceph -s

   cluster e581ab43-d0f5-4ea8-811f-94c8df16d044

    health HEALTH_WARN

           2 pgs degraded

           14 pgs peering

           4 pgs stale

           2 pgs stuck degraded

           25 pgs stuck inactive

           4 pgs stuck stale

           27 pgs stuck unclean

           2 pgs stuck undersized

           2 pgs undersized

           too few PGs per OSD (3 < min 30)

    monmap e1: 1 mons at {helga=10.10.10.64:6789/0}

           election epoch 2, quorum 0 helga

    osdmap e398: 60 osds: 60 up, 60 in; 2 remapped pgs

     pgmap v1553: 64 pgs, 1 pools, 0 bytes data, 0 objects

           2829 MB used, 218 TB / 218 TB avail

                 37 active+clean

                 12 peering

                 11 activating

                  2 stale+active+undersized+degraded

                  2 stale+remapped+peering

with time, the number of defects is growing. They literraly explode if we put objects on it.

- a 'ceph health detail' show for example entries like this one :

pg 0.22 is stuck inactive since forever, current state peering, last acting [18,17,0]

- A query on the PG shows

ceph pg  0.22 query

{

   "state": "peering",

../..

    "up": [

       18,

       17,

       0

   ],

          "blocked_by": [

               0,

               1,

               5,

               17

           ],

../..

}

If my understanding of the ceph query is correct, OSDs 1, 5 and 17 have nothing do do with this PG.... Where do they come from ??

Couldn't this be part of the "critical issues with CRUSH" 0.94.1 is meant to correct ?

Frederic

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com