OSDs keep going down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I am having some issues with my newly setup cluster.  I am able to get all of my 32 OSDs to start after setting up udev rules for my journal partitions but they keep going down.  It did seem like half of them would stay up at first but after I checked it this morning I found only 1/4 of them were up when I ran "ceph osd tree".  The systemd scripts are running so it doesn't seem like that is the issue.  I don't see anything glaring in the log files, which may just reflect my experience level with ceph.

I tried to look for errors and knock out any that seemed obvious but I can't seem to get that done either.  The cluster was initially set to 64pgs and I tried to update that to 1024 but it hasn't finished creating all of them and it seems stuck with 270 stale+creating pgs.  This is preventing me from updating the number of pgps as it says it is busy creating pgs. 

I am thinking that the downed OSDs are probably my problem as far as the pgs getting created are concerned.  I just don't can't seem to find the reason why they are going down.  Could someone help shine some light on this for me?


[ceph@matm-cm1 ~]$ ceph status
    cluster 5a463eb9-b918-4d97-b853-7a5ebd3c0ac2
     health HEALTH_ERR
            1006 pgs are stuck inactive for more than 300 seconds
            1 pgs degraded
            140 pgs down
            736 pgs peering
            1024 pgs stale
            1006 pgs stuck inactive
            18 pgs stuck unclean
            1 pgs undersized
            pool rbd pg_num 1024 > pgp_num 64
     monmap e1: 3 mons at {matm-cm1=192.168.41.153:6789/0,matm-cm2=192.168.41.154:6789/0,matm-cm3=192.168.41.155:6789/0}
            election epoch 8, quorum 0,1,2 matm-cm1,matm-cm2,matm-cm3
     osdmap e417: 32 osds: 9 up, 9 in; 496 remapped pgs
            flags sortbitwise
      pgmap v1129: 1024 pgs, 1 pools, 0 bytes data, 0 objects
            413 MB used, 16753 GB / 16754 GB avail
                 564 stale+remapped+peering
                 270 stale+creating
                 125 stale+down+remapped+peering
                  32 stale+peering
                  17 stale+active+remapped
                  15 stale+down+peering
                   1 stale+active+undersized+degraded+remapped

[ceph@matm-cm1 ~]$ ceph osd tree
ID WEIGHT   TYPE NAME         UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 58.17578 root default                                       
-2 14.54395     host matm-cs1                                  
 0  1.81799         osd.0        down        0          1.00000
 1  1.81799         osd.1        down        0          1.00000
 2  1.81799         osd.2        down        0          1.00000
 3  1.81799         osd.3        down        0          1.00000
 4  1.81799         osd.4        down        0          1.00000
 5  1.81799         osd.5        down        0          1.00000
 6  1.81799         osd.6        down        0          1.00000
 7  1.81799         osd.7        down        0          1.00000
-3 14.54395     host matm-cs2                                  
 8  1.81799         osd.8          up  1.00000          1.00000
 9  1.81799         osd.9          up  1.00000          1.00000
10  1.81799         osd.10         up  1.00000          1.00000
11  1.81799         osd.11         up  1.00000          1.00000
12  1.81799         osd.12         up  1.00000          1.00000
13  1.81799         osd.13         up  1.00000          1.00000
14  1.81799         osd.14         up  1.00000          1.00000
15  1.81799         osd.15         up  1.00000          1.00000
-4 14.54395     host matm-cs3                                  
16  1.81799         osd.16       down        0          1.00000
17  1.81799         osd.17       down        0          1.00000
18  1.81799         osd.18       down        0          1.00000
19  1.81799         osd.19       down        0          1.00000
20  1.81799         osd.20       down        0          1.00000
21  1.81799         osd.21       down        0          1.00000
22  1.81799         osd.22       down        0          1.00000
23  1.81799         osd.23       down        0          1.00000
-5 14.54395     host matm-cs4                                  
24  1.81799         osd.24       down        0          1.00000
31  1.81799         osd.31       down        0          1.00000
25  1.81799         osd.25       down        0          1.00000
27  1.81799         osd.27       down        0          1.00000
29  1.81799         osd.29       down        0          1.00000
28  1.81799         osd.28       down        0          1.00000
30  1.81799         osd.30         up  1.00000          1.00000
26  1.81799         osd.26       down        0          1.00000




Nate Curry

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux