Cluster network failure, osd declared up

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, 

consider the following scenario:
  • cluster with public and cluster networks
  • three node cluster
  • 5 osd per node
  • 1 mon per node
  • two node attached at the same 10GB switch - cluster network (room A)
  • one node attached to another 10GB switch  - cluster network (room B)
  • no redundancy between 10GB switches cluster network
  • redundant public network (1GB)
Cause:

the 10GB switch (cluster network) in room A turns off (maintenance/power loss etc)

Problem:

only 4 of 5 osd declared down on the second node, 5 of 5 osd declared up on the first node.
I/O on the clients stuck until manually turns off osd on first node.

This is our ceph.conf configuration:

...
public network = 10.x.x.x/24
cluster network = 172.x.x.x/24
...
mon osd report timeout = 15
mon osd down out interval = 600
...

the doc says:

If you declare a cluster network, OSDs will route heartbeat, object replication and recovery traffic over the cluster network. This may improve performance compared to using a single network. To configure a cluster network, add the following option to the [global] section of your Ceph configuration file. 

So, why ceph was not able to automatically turn off the isolated osd?

Lorenzo
--
Lorenzo Garuti
CED MaxMara
email: garuti.l@xxxxxxxxxx
tel: 0522 3993772 - 335 8416054
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux