Re: Still CRUSH problems with 0.94.1 ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We had a very similar problem, but was repeatable on Firefly as well. For us, it turns out the MTU on the switches were not all configured for 9000 byte frames. This prevented the peering process to complete and as data was added, and such got worse.

Here is a section I wrote for our internal documentation (I'm going to try and get this into the official documentation when I have some time).

What To Do If A Cluster Just Won't Get Healthy

Sometimes it seems a cluster just won't reach a healthy state. This is quite obvious when some PGs are stuck in a peering state for a long time. If looking through the logs don't give you ideas, here is a list of things to verify (don't skip any steps because these are often overlooked, but can cause a cluster to be stuck).

  1. Verify firewall settings. The firewall should be either off or tcp ports 6789 and 6800-6899 should allow incoming connections on the public and cluster networks.

    • iptables -L -n

  2. Verify that the clocks on all hosts are close in time and running ntpd.

    • date; ps aux | grep ntpd

  3. Verify that each host can communicate using Jumbo frames with each host on the public and private networks

  4. Check if there is some zombie OSD entries in the CRUSH map. If there are a number of DNE entries for OSDs, they haven't been cleaned up from the CRUSH map completely.

    • If you are unsure if an OSD is completely dead or might come back, you can set the weight of that OSD to zero

      • ceph osd crush reweight osd.XX 0.000

    • If you know that the OSD will not ever come back, you can remove it completely

      • ceph osd crush remove osd.X

  5. Check that you aren't running into an open file limit. Ceph OSDs starting by udev or SystemV/upstart/systemd should handle this pretty well. If processes are being started manually, be sure to check.

    • ps aux | grep ceph-osd (should have a ulimit command in the process command)

    • grep "Too many open files" /var/log/ceph/ceph-osd.* (should not return anything)


On Tue, Apr 21, 2015 at 7:03 AM, fred@xxxxxxxxxx <fred@xxxxxxxxxx> wrote:
Hi all,

may there be a problem with the crush function during 'from scratch' installation of 0.94.1-0 ?

This has been tested many times, with ceph-deploy-1.5.22-0 or ceph-deploy-1.5.23-0. Platform RHEL7.

Each time, the new cluster ends up in a weird state never seen on my previous installed versions (0.94, 0.87.1),
- I've seen things perhaps linked to ceph-deploy-1.5.23-0, either one or more monitors being unable to form the cluster (with respawning 'python /usr/sbin/ceph-create-keys' messages). But I think that's other part of the issue.
- the main issue is visible as a warning on health of the PGs as soon as the cluster is enough formed to answer a 'ceph -s'.

- here is a 1 Mon, almost empty freshly installed cluster :

ROOT > ceph -s
   cluster e581ab43-d0f5-4ea8-811f-94c8df16d044
    health HEALTH_WARN
           2 pgs degraded
           14 pgs peering
           4 pgs stale
           2 pgs stuck degraded
           25 pgs stuck inactive
           4 pgs stuck stale
           27 pgs stuck unclean
           2 pgs stuck undersized
           2 pgs undersized
           too few PGs per OSD (3 < min 30)
    monmap e1: 1 mons at {helga=10.10.10.64:6789/0}
           election epoch 2, quorum 0 helga
    osdmap e398: 60 osds: 60 up, 60 in; 2 remapped pgs
     pgmap v1553: 64 pgs, 1 pools, 0 bytes data, 0 objects
           2829 MB used, 218 TB / 218 TB avail
                 37 active+clean
                 12 peering
                 11 activating
                  2 stale+active+undersized+degraded
                  2 stale+remapped+peering

with time, the number of defects is growing. They literraly explode if we put objects on it.

- a 'ceph health detail' show for example entries like this one :
pg 0.22 is stuck inactive since forever, current state peering, last acting [18,17,0]

- A query on the PG shows
ceph pg  0.22 query
{
   "state": "peering",
../..
    "up": [
       18,
       17,
       0
   ],
          "blocked_by": [
               0,
               1,
               5,
               17
           ],
../..
}


If my understanding of the ceph query is correct, OSDs 1, 5 and 17 have nothing do do with this PG.... Where do they come from ??
Couldn't this be part of the "critical issues with CRUSH" 0.94.1 is meant to correct ?

Frederic
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux