My first thought is not this use case but could it be a way to have dual stack ip v4/v6? Sent from my iPad > On Jul 6, 2018, at 6:41 PM, Sage Weil <sweil@xxxxxxxxxx> wrote: > > Hi everyone, > > Input welcome on an interesting proposal that came up with a user that > prefers to use seperate networks to each of their top-of-rack switches > instead of bonding. They would have, for example, 4 TOR switches, each > with their own IP network, two public and two private. The motivation is, > presumably, the cornucopia of problems one encounters with bonding and > various switch vendors, firmwares, and opportunities for user > misconfiguration. (I've seen my share of broken bonding setups and they > are a huge headache to diagnose, and it's usually an even bigger headache > to convince the network ops folks that it's their problem.) > > My understanding is that in order for this to work each Ceph daemon would > need to bind to two address (or four, in the case of the OSD) instead of > just one. These addresses would need to be shared throughout the system > (in the OSDMap etc), and then when a connection is being made, we would > round-robin connection attempts across them. In theory the normal > connection retry should make it "just work," provided we can tolerate the > connection timeout/latency when we encounter a bad network. > > The new addrvec code that is going in for nautilus can (in principle) > handle the multiple addresses for each daemon. The main changes would be > (1) defining a configuration model that tells daemons to bind to multiple > networks (and which networks to bind to) and (2) the messenger change to > round-robin across available addresses. And (3) some hackery so that our > QA can cover the relevant messenger code even though we don't have > multiple networks (probably including a made-up network everywhere would > do the trick... we'd round-robin across it and it would always fail). > > Stepping back, though, I think the bigger question is: is this a good > idea? My first reaction to this was that bonding and multipath in the > network is a problem for the network, and the fact that the network > vendors seem to regularly screw this up isn't a very compelling reason to > think that we'd do a better job than they do. On the other hand, it seems > possible to handle this case without too much additional code, and the > reality seems to be that the network frequently *does* tend to screw it > up. > > Anecdotally I'm told some other storage products do this, but I have a > feeling they do it in the sense that if you're using iSCSI you can just > define target addresses on both networks and the normal iSCSI multipath > does its thing (perfectly, I'm sure). > > Thoughts? > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html