Check your firewall rules
On Fri, Apr 1, 2016 at 10:28 AM, Nate Curry <curry@xxxxxxxxxxxxx> wrote:
I am having some issues with my newly setup cluster. I am able to get all of my 32 OSDs to start after setting up udev rules for my journal partitions but they keep going down. It did seem like half of them would stay up at first but after I checked it this morning I found only 1/4 of them were up when I ran "ceph osd tree". The systemd scripts are running so it doesn't seem like that is the issue. I don't see anything glaring in the log files, which may just reflect my experience level with ceph.
I tried to look for errors and knock out any that seemed obvious but I can't seem to get that done either. The cluster was initially set to 64pgs and I tried to update that to 1024 but it hasn't finished creating all of them and it seems stuck with 270 stale+creating pgs. This is preventing me from updating the number of pgps as it says it is busy creating pgs.
I am thinking that the downed OSDs are probably my problem as far as the pgs getting created are concerned. I just don't can't seem to find the reason why they are going down. Could someone help shine some light on this for me?[ceph@matm-cm1 ~]$ ceph status
cluster 5a463eb9-b918-4d97-b853-7a5ebd3c0ac2
health HEALTH_ERR
1006 pgs are stuck inactive for more than 300 seconds
1 pgs degraded
140 pgs down
736 pgs peering
1024 pgs stale
1006 pgs stuck inactive
18 pgs stuck unclean
1 pgs undersized
pool rbd pg_num 1024 > pgp_num 64
monmap e1: 3 mons at {matm-cm1=192.168.41.153:6789/0,matm-cm2=192.168.41.154:6789/0,matm-cm3=192.168.41.155:6789/0}
election epoch 8, quorum 0,1,2 matm-cm1,matm-cm2,matm-cm3
osdmap e417: 32 osds: 9 up, 9 in; 496 remapped pgs
flags sortbitwise
pgmap v1129: 1024 pgs, 1 pools, 0 bytes data, 0 objects
413 MB used, 16753 GB / 16754 GB avail
564 stale+remapped+peering
270 stale+creating
125 stale+down+remapped+peering
32 stale+peering
17 stale+active+remapped
15 stale+down+peering
1 stale+active+undersized+degraded+remapped
[ceph@matm-cm1 ~]$ ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 58.17578 root default
-2 14.54395 host matm-cs1
0 1.81799 osd.0 down 0 1.00000
1 1.81799 osd.1 down 0 1.00000
2 1.81799 osd.2 down 0 1.00000
3 1.81799 osd.3 down 0 1.00000
4 1.81799 osd.4 down 0 1.00000
5 1.81799 osd.5 down 0 1.00000
6 1.81799 osd.6 down 0 1.00000
7 1.81799 osd.7 down 0 1.00000
-3 14.54395 host matm-cs2
8 1.81799 osd.8 up 1.00000 1.00000
9 1.81799 osd.9 up 1.00000 1.00000
10 1.81799 osd.10 up 1.00000 1.00000
11 1.81799 osd.11 up 1.00000 1.00000
12 1.81799 osd.12 up 1.00000 1.00000
13 1.81799 osd.13 up 1.00000 1.00000
14 1.81799 osd.14 up 1.00000 1.00000
15 1.81799 osd.15 up 1.00000 1.00000
-4 14.54395 host matm-cs3
16 1.81799 osd.16 down 0 1.00000
17 1.81799 osd.17 down 0 1.00000
18 1.81799 osd.18 down 0 1.00000
19 1.81799 osd.19 down 0 1.00000
20 1.81799 osd.20 down 0 1.00000
21 1.81799 osd.21 down 0 1.00000
22 1.81799 osd.22 down 0 1.00000
23 1.81799 osd.23 down 0 1.00000
-5 14.54395 host matm-cs4
24 1.81799 osd.24 down 0 1.00000
31 1.81799 osd.31 down 0 1.00000
25 1.81799 osd.25 down 0 1.00000
27 1.81799 osd.27 down 0 1.00000
29 1.81799 osd.29 down 0 1.00000
28 1.81799 osd.28 down 0 1.00000
30 1.81799 osd.30 up 1.00000 1.00000
26 1.81799 osd.26 down 0 1.00000Nate Curry
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com