Re: RFC: ceph daemon multi-homing

Michael Lowe <j.michael.lowe@xxxxxxxxx> · Fri, 6 Jul 2018 19:06:09 -0400

My first thought is not this use case but could it be a way to have dual stack ip v4/v6?

Sent from my iPad

> On Jul 6, 2018, at 6:41 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> 
> Hi everyone,
> 
> Input welcome on an interesting proposal that came up with a user that 
> prefers to use seperate networks to each of their top-of-rack switches 
> instead of bonding.  They would have, for example, 4 TOR switches, each 
> with their own IP network, two public and two private.  The motivation is, 
> presumably, the cornucopia of problems one encounters with bonding and 
> various switch vendors, firmwares, and opportunities for user 
> misconfiguration.  (I've seen my share of broken bonding setups and they 
> are a huge headache to diagnose, and it's usually an even bigger headache 
> to convince the network ops folks that it's their problem.)
> 
> My understanding is that in order for this to work each Ceph daemon would 
> need to bind to two address (or four, in the case of the OSD) instead of 
> just one.  These addresses would need to be shared throughout the system 
> (in the OSDMap etc), and then when a connection is being made, we would 
> round-robin connection attempts across them.  In theory the normal 
> connection retry should make it "just work," provided we can tolerate the 
> connection timeout/latency when we encounter a bad network.
> 
> The new addrvec code that is going in for nautilus can (in principle) 
> handle the multiple addresses for each daemon.  The main changes would be 
> (1) defining a configuration model that tells daemons to bind to multiple 
> networks (and which networks to bind to) and (2) the messenger change to 
> round-robin across available addresses.  And (3) some hackery so that our 
> QA can cover the relevant messenger code even though we don't have 
> multiple networks (probably including a made-up network everywhere would 
> do the trick... we'd round-robin across it and it would always fail).
> 
> Stepping back, though, I think the bigger question is: is this a good 
> idea?  My first reaction to this was that bonding and multipath in the 
> network is a problem for the network, and the fact that the network 
> vendors seem to regularly screw this up isn't a very compelling reason to 
> think that we'd do a better job than they do.  On the other hand, it seems 
> possible to handle this case without too much additional code, and the 
> reality seems to be that the network frequently *does* tend to screw it 
> up.
> 
> Anecdotally I'm told some other storage products do this, but I have a 
> feeling they do it in the sense that if you're using iSCSI you can just 
> define target addresses on both networks and the normal iSCSI multipath 
> does its thing (perfectly, I'm sure).
> 
> Thoughts?
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html