Re: Help! 61.1 killed my monitors in prod

Stephen Street <sgs@xxxxxxxxxxxxxxxxxxxxxx> · Mon, 13 May 2013 21:15:38 -0700

Joao,

On May 13, 2013, at 3:24 PM, Stephen Street <sgs@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> 
> From the logs, it appears that the monitors are struggling to bind to the network at system start. If I issue a initctl restart ceph-mon-all to all nodes running monitors, the system starts functioning correctly.
> 

I found the issue.  My nodes have two ethernet interfaces (eth0 and eth1) and both are configured to use static DHCP leases. My cluster is configured to use address on eth0 (192.168.139.0/24).  The upstart job /etc/init/ceph-all.conf contains the following line:

	start on (local-filesystems and net-device-up IFACE!=lo)

It appears that eth1 emits a net-device-up before eth0 causing the ceph-all upstart job to begin running before the desired network is available, leading to the address binding error seen in the log.  I changed the upstart job line to:

	start on (local-filesystems and net-device-up IFACE=eth0)

and the cluster cold starts successfully.

Thanks for your help
Stephen

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com