Re: Ceph cluster network failure impact

Christian Balzer <chibi@xxxxxxx> · Tue, 30 Aug 2016 13:21:02 +0900

Hello,

On Mon, 29 Aug 2016 16:16:11 -0700 Eric Kolb wrote:

> Hello,
> 
> Have read a few items about what occurs if the back-end cluster switch 
> were to fail or be rebooted due to code updates. From the 
> Troubleshooting OSDs guide 
> (http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-osd/) 
> it states, "if the cluster (back-end) network fails or develops 
> significant latency while the public (front-end) network operates 
> optimally, OSDs currently do not handle this situation well".
> 
That's putting it very nicely.

> May someone have any experience with this scenario they may be able to 
> pass along?
> 
No personal experience, as I strive to avoid that scenario at all costs.

If the "few items" you read contained mails by me, then the following will
sound familiar:

1. Why split the network in the first place?
>From a bandwidth perspective, it only makes sense if your OSDs can write
faster than the combined bandwidth. 
If you're thinking about segregating the networks for policy reasons,
still use a unified network but with VLANs. 

2. Avoid failures.
Since you're already looking at (at least) 2 network interfaces, avoid a
node loss due to interface or switch failures entirely.
Either by using Active-Standby failover (less bandwidth, but the cheapest
switches will do) or the more advantageous LACP with MC-LAG switches
(full bandwidth if both switches are up, still one link of BW if one goes
down).
The later service level can also be achieved by routing (OSPF/BGP) on the
hosts, something that was discussed in here as well. 
It's more involved, but can use cheap switches as well.

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com