Re: OSDs keep going down

Bob R <bobr@xxxxxxxxxxxxxx> · Fri, 1 Apr 2016 16:52:48 -0700

Check your firewall rules

On Fri, Apr 1, 2016 at 10:28 AM, Nate Curry <curry@xxxxxxxxxxxxx> wrote:
I am having some issues with my newly setup cluster.  I am able to get all of my 32 OSDs to start after setting up udev rules for my journal partitions but they keep going down.  It did seem like half of them would stay up at first but after I checked it this morning I found only 1/4 of them were up when I ran "ceph osd tree".  The systemd scripts are running so it doesn't seem like that is the issue.  I don't see anything glaring in the log files, which may just reflect my experience level with ceph.

I tried to look for errors and knock out any that seemed obvious but I can't seem to get that done either.  The cluster was initially set to 64pgs and I tried to update that to 1024 but it hasn't finished creating all of them and it seems stuck with 270 stale+creating pgs.  This is preventing me from updating the number of pgps as it says it is busy creating pgs.  

I am thinking that the downed OSDs are probably my problem as far as the pgs getting created are concerned.  I just don't can't seem to find the reason why they are going down.  Could someone help shine some light on this for me?

[ceph@matm-cm1 ~]$ ceph status
    cluster 5a463eb9-b918-4d97-b853-7a5ebd3c0ac2
     health HEALTH_ERR
            1006 pgs are stuck inactive for more than 300 seconds
            1 pgs degraded
            140 pgs down
            736 pgs peering
            1024 pgs stale
            1006 pgs stuck inactive
            18 pgs stuck unclean
            1 pgs undersized
            pool rbd pg_num 1024 > pgp_num 64
     monmap e1: 3 mons at {matm-cm1=192.168.41.153:6789/0,matm-cm2=192.168.41.154:6789/0,matm-cm3=192.168.41.155:6789/0}
            election epoch 8, quorum 0,1,2 matm-cm1,matm-cm2,matm-cm3
     osdmap e417: 32 osds: 9 up, 9 in; 496 remapped pgs
            flags sortbitwise
      pgmap v1129: 1024 pgs, 1 pools, 0 bytes data, 0 objects
            413 MB used, 16753 GB / 16754 GB avail
                 564 stale+remapped+peering
                 270 stale+creating
                 125 stale+down+remapped+peering
                  32 stale+peering
                  17 stale+active+remapped
                  15 stale+down+peering
                   1 stale+active+undersized+degraded+remapped

[ceph@matm-cm1 ~]$ ceph osd tree
ID WEIGHT   TYPE NAME         UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 58.17578 root default                                        
-2 14.54395     host matm-cs1                                   
 0  1.81799         osd.0        down        0          1.00000 
 1  1.81799         osd.1        down        0          1.00000 
 2  1.81799         osd.2        down        0          1.00000 
 3  1.81799         osd.3        down        0          1.00000 
 4  1.81799         osd.4        down        0          1.00000 
 5  1.81799         osd.5        down        0          1.00000 
 6  1.81799         osd.6        down        0          1.00000 
 7  1.81799         osd.7        down        0          1.00000 
-3 14.54395     host matm-cs2                                   
 8  1.81799         osd.8          up  1.00000          1.00000 
 9  1.81799         osd.9          up  1.00000          1.00000 
10  1.81799         osd.10         up  1.00000          1.00000 
11  1.81799         osd.11         up  1.00000          1.00000 
12  1.81799         osd.12         up  1.00000          1.00000 
13  1.81799         osd.13         up  1.00000          1.00000 
14  1.81799         osd.14         up  1.00000          1.00000 
15  1.81799         osd.15         up  1.00000          1.00000 
-4 14.54395     host matm-cs3                                   
16  1.81799         osd.16       down        0          1.00000 
17  1.81799         osd.17       down        0          1.00000 
18  1.81799         osd.18       down        0          1.00000 
19  1.81799         osd.19       down        0          1.00000 
20  1.81799         osd.20       down        0          1.00000 
21  1.81799         osd.21       down        0          1.00000 
22  1.81799         osd.22       down        0          1.00000 
23  1.81799         osd.23       down        0          1.00000 
-5 14.54395     host matm-cs4                                   
24  1.81799         osd.24       down        0          1.00000 
31  1.81799         osd.31       down        0          1.00000 
25  1.81799         osd.25       down        0          1.00000 
27  1.81799         osd.27       down        0          1.00000 
29  1.81799         osd.29       down        0          1.00000 
28  1.81799         osd.28       down        0          1.00000 
30  1.81799         osd.30         up  1.00000          1.00000 
26  1.81799         osd.26       down        0          1.00000 

Nate Curry

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com