Re: After reboot nothing worked

Karan Singh <ksingh@xxxxxx> · Tue, 17 Dec 2013 11:54:26 +0200 (EET)

Umar

Ceph is stable for production , there are a large number of ceph clusters deployed and running smoothly in PRODUCTIONS and countless in testing / pre-production.  

Since you are facing problems with your ceph testing , it does not mean CEPH is unstable. 

I would suggest put some time troubleshooting your problem.

What i see from your logs  --

 1) you have 2 Mons thats a problem ( either have 1  or have 3 to form quorum ) . Add 1 more monitor node 
 2)  out of 2 OSD , only 1 is IN , check where is the other one and try bringing both of them UP . Add few more OSD's to remove health warning . 2 OSD is a very less numbers for OSD

Many Thanks
Karan Singh

From: "Umar Draz" <unix.co@xxxxxxxxx>
To: ceph-users@xxxxxxxx
Sent: Tuesday, 17 December, 2013 8:51:27 AM
Subject:  After reboot nothing worked

Hello,
I have 2 node ceph cluster, I just rebooted both of the host just for testing that after rebooting the cluster remain work or not, and the result was cluster unable to start.

here is ceph -s output

     health HEALTH_WARN 704 pgs stale; 704 pgs stuck stale; mds cluster is degraded; 1/1 in osds are down; clock skew detected on mon.kvm2
     monmap e2: 2 mons at {kvm1=192.168.214.10:6789/0,kvm2=192.168.214.11:6789/0}, election epoch 16, quorum 0,1 kvm1,kvm2
     mdsmap e13: 1/1/1 up {0=kvm1=up:replay}
     osdmap e29: 2 osds: 0 up, 1 in
      pgmap v68: 704 pgs, 4 pools, 9603 bytes data, 23 objects
            1062 MB used, 80816 MB / 81879 MB avail
                 704 stale+active+clean

according to this useless documentation.

http://ceph.com/docs/master/rados/operations/monitoring-osd-pg/

I tried ceph osd tree

the output was

# id    weight  type name       up/down reweight
-1      0.16    root default
-2      0.07999         host kvm1
0       0.07999                 osd.0   down    1
-3      0.07999         host kvm2
1       0.07999                 osd.1   down    0

Then i tried

sudo /etc/init.d/ceph -a start osd.0
sudo /etc/init.d/ceph -a start osd.1

to start the osd on both host the result was

/etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines )

/etc/init.d/ceph: osd.1 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines )

Now question is what is this? is really ceph is stable? can we use this for production environment?

My both host has ntp running the time is upto date.

Br.

Umar

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com