Peering or connections problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi List,

I have a Ceph setup consisting of 3 nodes, 1 mon and 2 osd.  It seems that both my osds are in but down. the osd processes on the osd nodes exist and are listening and I am able to successfully telnet to - and from all nodes on the ports to and from all other nodes on the respective ports. Still, my pgs are all stuck in this status:

pg 2.2 is stuck unclean since forever, current state creating, last acting []


Here is my ceph.config:

http://pastebin.com/SpQA38Em

and here is what what 'ceph report'  has to say.

http://pastebin.com/3gPJhpnH


This is what the osd logs show:

2014-01-13 17:23:32.638235 7f98486a7700  5 osd.0 16 tick
2014-01-13 17:23:32.638270 7f98486a7700 10 osd.0 16 do_waiters -- start
2014-01-13 17:23:32.638273 7f98486a7700 10 osd.0 16 do_waiters -- finish
2014-01-13 17:23:32.657880 7f9836e63700 20 osd.0 16 update_osd_stat osd_stat(1057 MB used, 29646 MB avail, 30704 MB total, peers []/[] op hist [])
2014-01-13 17:23:32.657935 7f9836e63700  5 osd.0 16 heartbeat: osd_stat(1057 MB used, 29646 MB avail, 30704 MB total, peers []/[] op hist [])
2014-01-13 17:23:33.638437 7f98486a7700  5 osd.0 16 tick
2014-01-13 17:23:33.638475 7f98486a7700 10 osd.0 16 do_waiters -- start
2014-01-13 17:23:33.638479 7f98486a7700 10 osd.0 16 do_waiters -- finish
2014-01-13 17:23:33.758194 7f9836e63700 20 osd.0 16 update_osd_stat osd_stat(1057 MB used, 29646 MB avail, 30704 MB total, peers []/[] op hist [])
2014-01-13 17:23:33.758257 7f9836e63700  5 osd.0 16 heartbeat: osd_stat(1057 MB used, 29646 MB avail, 30704 MB total, peers []/[] op hist [])
2014-01-13 17:23:34.638658 7f98486a7700  5 osd.0 16 tick
2014-01-13 17:23:34.638692 7f98486a7700 10 osd.0 16 do_waiters -- start
2014-01-13 17:23:34.638694 7f98486a7700 10 osd.0 16 do_waiters -- finish
2014-01-13 17:23:35.638936 7f98486a7700  5 osd.0 16 tick
.
.
.


and this is what the mon log says:

2014-01-13 17:25:21.670754 7f10474b4700 11 mon.ceph0@0(leader) e1 tick
2014-01-13 17:25:21.670792 7f10474b4700 10 mon.ceph0@0(leader).pg v8 v8: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
2014-01-13 17:25:21.670821 7f10474b4700 10 mon.ceph0@0(leader).mds e1 e1: 0/0/1 up
2014-01-13 17:25:21.670831 7f10474b4700 10 mon.ceph0@0(leader).osd e7 e7: 2 osds: 0 up, 2 in
2014-01-13 17:25:21.670839 7f10474b4700 20 mon.ceph0@0(leader).osd e7 osd.0 laggy halflife 3600 decay_k -0.000192541 down for 5.000466 decay 0.999038
2014-01-13 17:25:21.670876 7f10474b4700 10 mon.ceph0@0(leader).osd e7 tick entire containing rack subtree for osd.0 is down; resetting timer
2014-01-13 17:25:21.670881 7f10474b4700 20 mon.ceph0@0(leader).osd e7 osd.1 laggy halflife 3600 decay_k -0.000192541 down for 5.000466 decay 0.999038
2014-01-13 17:25:21.670890 7f10474b4700 10 mon.ceph0@0(leader).osd e7 tick entire containing rack subtree for osd.1 is down; resetting timer
2014-01-13 17:25:21.670895 7f10474b4700  1 mon.ceph0@0(leader).paxos(paxos active c 1..260) is_readable now=2014-01-13 17:25:21.670896 lease_expire=0.000000 has v0 lc 260
2014-01-13 17:25:21.670917 7f10474b4700  1 mon.ceph0@0(leader).paxos(paxos active c 1..260) is_readable now=2014-01-13 17:25:21.670918 lease_expire=0.000000 has v0 lc 260
2014-01-13 17:25:21.670927 7f10474b4700  1 mon.ceph0@0(leader).paxos(paxos active c 1..260) is_readable now=2014-01-13 17:25:21.670928 lease_expire=0.000000 has v0 lc 260
2014-01-13 17:25:21.670934 7f10474b4700 10 mon.ceph0@0(leader).log v36 log
2014-01-13 17:25:21.670939 7f10474b4700 10 mon.ceph0@0(leader).auth v207 auth
2014-01-13 17:25:21.670951 7f10474b4700 20 mon.ceph0@0(leader) e1 sync_trim_providers

This is what a 'ps aux| grep ceph | grep ceph'  yields on  each respective node:

mon.0

root      6567  0.0  0.4 156984 13084 ?        Sl   04:41   0:07 /usr/bin/ceph-mon -i ceph0 --pid-file /var/run/ceph/mon.ceph0.pid -c /etc/ceph/ceph.conf

osd.0
root      3435  0.0  0.6 488344 20140 ?        Ssl  04:41   0:26 /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf

osd.1
root      2926  0.0  0.6 487080 18912 ?        Ssl  04:41   0:29 /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf

This is what 'netstat  -tapn | grep -i listen | grep ceph'  yields on each respective node:

mon.0
tcp        0      0 192.168.10.200:6789     0.0.0.0:*               LISTEN      6567/ceph-mon  

osd.0
tcp        0      0 10.10.10.201:6800       0.0.0.0:*               LISTEN      3435/ceph-osd  
tcp        0      0 192.168.10.201:6800     0.0.0.0:*               LISTEN      3435/ceph-osd  
tcp        0      0 192.168.10.201:6801     0.0.0.0:*               LISTEN      3435/ceph-osd  
tcp        0      0 10.10.10.201:6801       0.0.0.0:*               LISTEN      3435/ceph-osd  
tcp        0      0 192.168.10.201:6802     0.0.0.0:*               LISTEN      3435/ceph-osd

osd.1
tcp        0      0 10.10.10.202:6800       0.0.0.0:*               LISTEN      2926/ceph-osd  
tcp        0      0 192.168.10.202:6800     0.0.0.0:*               LISTEN      2926/ceph-osd  
tcp        0      0 192.168.10.202:6801     0.0.0.0:*               LISTEN      2926/ceph-osd  
tcp        0      0 10.10.10.202:6801       0.0.0.0:*               LISTEN      2926/ceph-osd  
tcp        0      0 192.168.10.202:6802     0.0.0.0:*               LISTEN      2926/ceph-osd


Thank you.

Best,
Moe
1984








_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux