Run : 'ceph-osd -i 0 -f' in a console and see what is the output. Thanks & Regards Somnath -----Original Message----- From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Fredy Neeser Sent: Tuesday, July 07, 2015 9:15 AM To: ceph-users@xxxxxxxxxxxxxx Subject: Ceph OSDs are down and cannot be started Hi, I had a working Ceph Hammer test setup with 3 OSDs and 1 MON (running on VMs), and RBD was working fine. The setup was not touched for two weeks (also no I/O activity), and when I looked again, the cluster was in a bad state: On the MON node (sto-vm20): $ ceph health HEALTH_WARN 72 pgs stale; 72 pgs stuck stale; 3/3 in osds are down $ ceph health detail HEALTH_WARN 72 pgs stale; 72 pgs stuck stale; 3/3 in osds are down pg 0.22 is stuck stale for 1457679.263525, current state stale+active +clean, last acting [2,1,0] pg 0.21 is stuck stale for 1457679.263529, current state stale+active +clean, last acting [1,2,0] pg 0.20 is stuck stale for 1457679.263531, current state stale+active +clean, last acting [1,0,2] pg 0.1f is stuck stale for 1457679.263533, current state stale+active +clean, last acting [2,0,1] ... pg 0.24 is stuck stale for 1457679.263625, current state stale+active +clean, last acting [2,0,1] pg 0.23 is stuck stale for 1457679.263627, current state stale+active +clean, last acting [1,2,0] osd.0 is down since epoch 16, last address 9.4.68.111:6800/1658 osd.1 is down since epoch 16, last address 9.4.68.112:6800/1659 osd.2 is down since epoch 16, last address 9.4.68.113:6800/1654 On the OSD nodes (sto-vm21, sto-vm22, sto-vm23), no Ceph daemon is running: $ ps -ef | egrep "ceph|osd|rados" (returns nothing) I rebooted the OSDs as well as the MON, but still only the ceph-mon daemon is running on the MON node. I tried to start the OSDs manually by executing $ sudo /etc/init.d/ceph start osd on the OSD nodes, but I saw neither an error message nor alogfile update. On the OSD nodes, the log files in /var/log/ceph have no longer been updated since the failure event. What is strange is that the OSDs no longer have any admin socket files (which should normally be in /run/ceph), whereas the MON node does have an admin socket: $ ls -la /run/ceph srwxr-xr-x 1 root root 0 Jul 7 15:27 ceph-mon.sto-vm20.asok This looks very similar to http://tracker.ceph.com/issues/7188 Bug #7188: Admin socket files are lost on log rotation calling initctl reload (ubuntu 13.04 only) Any ideas how to restart / recover the OSDs are much appreciated. How can I start the OSD daemon(s) such that I can see any errors? Thanks, - Fredy PS: The Ceph setup is on Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-41-generic x86_64) _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com