Ok, probably hitting this: http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/ flapping OSD part... Cheers, Robert ________________________________________ From: ceph-users-bounces@xxxxxxxxxxxxxx [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Robert van Leeuwen [Robert.vanLeeuwen@xxxxxxxxxxxxx] Sent: Tuesday, November 19, 2013 3:22 PM To: ceph-users@xxxxxxxxxxxxxx Subject: OSDs marking itself down and reconnecting back to the cluster Hi, I've setup a new cluster as follows: 3 * MON node 3 x OSD node with dual SSD for journal and 10 X 1TB disk 2 x 1GB ethernet, two networks with cluster & public configured When running the rados bench (from a mon or OSD node) I've noticed the OSDs marking itself down. They do seem to reconnect to the cluster immediately and all OSD's on a single physical node will follow in rapid succession. All three physical nodes have the same issue. The machines are running SL6.4 with ceph-0.67.4-0.el6.x86_64 Any suggestions to what may be happening? Here is an example: # ceph -w|grep down 2013-11-19 13:45:09.485754 mon.0 [INF] osd.16 marked itself down 2013-11-19 13:45:12.290152 mon.0 [INF] osd.18 marked itself down 2013-11-19 13:45:15.071228 mon.0 [INF] osd.15 marked itself down 2013-11-19 13:45:17.869570 mon.0 [INF] osd.19 marked itself down 2013-11-19 13:45:20.629192 mon.0 [INF] osd.17 marked itself down 2013-11-19 13:45:23.471693 mon.0 [INF] osd.12 marked itself down 2013-11-19 13:45:25.270015 mon.0 [INF] osd.10 marked itself down 2013-11-19 13:45:28.260899 mon.0 [INF] osd.14 marked itself down 2013-11-19 13:45:31.033834 mon.0 [INF] osd.11 marked itself down 2013-11-19 13:46:53.039820 mon.0 [INF] osd.3 marked itself down 2013-11-19 13:46:55.987553 mon.0 [INF] osd.2 marked itself down 2013-11-19 13:46:58.682728 mon.0 [INF] osd.9 marked itself down 2013-11-19 13:47:00.539705 mon.0 [INF] osd.6 marked itself down 2013-11-19 13:47:03.351333 mon.0 [INF] osd.8 marked itself down 2013-11-19 13:47:06.109042 mon.0 [INF] osd.0 marked itself down 2013-11-19 13:47:08.887278 mon.0 [INF] osd.1 marked itself down 2013-11-19 13:47:10.784297 mon.0 [INF] osd.5 marked itself down 2013-11-19 13:47:13.623931 mon.0 [INF] osd.4 marked itself down 2013-11-19 13:47:16.431803 mon.0 [INF] osd.7 marked itself down 2013-11-19 13:48:40.694596 7f15a6192700 0 monclient: hunting for new mon Thx, Robert van Leeuwen _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com