Re: failure of public network kills connectivity

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/05/2016 07:59 PM, Adrian Imboden wrote:
> Hi
> 
> I recently set up a small ceph cluster at home for testing and private
> purposes.
> It works really great, but I have a problem that may come from my
> small-size configuration.
> 
> All nodes are running Ubuntu 14.04 and ceph infernalis 9.2.0.
> 
> I have two networks as recommended:
> cluster network: 10.10.128.0/24
> public network: 10.10.48.0/22
> 
> All ip addresses are configured statically.
> 
> The behavior that I see here is that when I let the rados benchmark run
> (e.g "rados bench -p data 300 write"),
> the public and the cluster network is being used to transmit the data
> (about 50%/50%).
> 
> When I disconnect the cluster from the public network, the connection
> between the osd's is lost, while the monitors keep seeing each other:
> HEALTH_WARN 129 pgs degraded; 127 pgs stale; 129 pgs undersized;
> recovery 1885/7316 objects degraded (25.765%); 6/8 in osds are down
> 

The cluster network is only used by replication and recovery between
OSDs. The monitors are not present on the cluster network and only live
on the public network.

I usually don't see any good reason to use the cluster network since it
only adds a failure domain.

If you have a single network for Ceph it works just fine. Bandwidth is
in most cases I see not the problem, latency usually is.

My advise, do not over-engineer and stick with a single network. Makes
life a lot easier.

Wido

> 
> What I expect is, that only the cluster network is being used, when a
> ceph node itself reads or writes data.
> Furthermore, I expected that a failure of the public network does not
> affect the connectivity of the nodes themselves.
> 
> What do I not yet understand, or what am I configuring the wrong way?
> 
> I plan to run kvm on these same nodes beside the storage-cluster as it
> is only a small setup. Thats the reason why I am a little bit concerned
> about
> this behaviour.
> 
> 
> This is how it is setup:
> |- node1 (cluster: 10.10.128.1, public: 10.10.49.1)
> |  |- osd
> |  |- osd
> |  |- mon
> |
> |- node2 (cluster: 10.10.128.2, public: 10.10.49.2)
> |  |- osd
> |  |- osd
> |  |- mon
> |
> |- node3 (cluster: 10.10.128.3, public: 10.10.49.3)
> |  |- osd
> |  |- osd
> |  |- mon
> |
> |- node4 (cluster: 10.10.128.4, public: 10.10.49.4)
> |  |- osd
> |  |- osd
> |
> 
> This is my ceph config:
> 
> [global]
> auth supported = cephx
> 
> fsid = 64599def-5741-4bda-8ce5-31a85af884bb
> mon initial members = node1 node3 node2 node4
> mon host = 10.10.128.1 10.10.128.3 10.10.128.2 10.10.128.4
> public network = 10.10.48.0/22
> cluster network = 10.10.128.0/24
> auth cluster required = cephx
> auth service required = cephx
> auth client required = cephx
> 
> [mon.node1]
>     host = node1
>     mon data = /var/lib/ceph/mon/ceph-node1/
> 
> [mon.node3]
>     host = node3
>     mon data = /var/lib/ceph/mon/ceph-node3/
> 
> [mon.node2]
>     host = node2
>     mon data = /var/lib/ceph/mon/ceph-node2/
> 
> [mon.node4]
>     host = node4
>     mon data = /var/lib/ceph/mon/ceph-node4/
> 
> 
> Thank you very much
> 
> Greetings
> Adrian
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux