Greg, see below. On Thu, 2015-12-03 at 13:25 -0800, Gregory Farnum wrote: > On Thu, Dec 3, 2015 at 12:13 PM, Martin Millnert <martin@xxxxxxxxxxx> wrote: > > Hi, > > > > we're deploying Ceph on Linux for multiple purposes. > > We want to build network isolation in our L3 DC network using VRF:s. > > > > In the case of Ceph this means that we are separating the Ceph public > > network from the Ceph cluster network, in this manner, into separate > > network routing domains (for those who do not know what a VRF is). > > > > Furthermore, we're also running (per-VRF) dynamically routed L3 all the > > way to the hosts (OSPF from ToR switch), and need to separate route > > tables on the hosts properly. This is done using "ip rule" today. > > We use VLANs to separate the VRF:s from each other between ToR and > > hosts, so there is no problem to determine which VRF an incoming packet > > to a host belongs to (iif $dev). > > > > The problem is selecting the proper route table for outbound packets > > from the host. > > > > There is current work in progress for a redesign [1] of the old VRF [2] > > design in the Linux Kernel. At least in the new design, there is an > > intended way of placing processes within a VRF such that, similar to > > network namespaces, the processes are unaware that they are in fact > > living within a VRF. > > > > This would work for a process such as the 'mon', which only lives in the > > public network. > > > > But it doesn't work for the OSD, which uses separate sockets for public > > and cluster networks. > > > > There is however a real simple solution: > > 1. Use something similar to > > setsockopt(sockfd, SOL_SOCKET, SO_MARK, puborclust_val, sizeof(one)) > > (untested) > > 2. set up "ip rule" for outbound traffic to select an appropriate route > > table based on the MARK value of "puborclust_val" above. > > > > AFAIK BSD doesn't have SO_MARK specifically, but this is a quite simple > > option that adds a lot of utility for us, and, I imagine others. > > > > I'm willing to write it and test it too. But before doing that, I'm > > interested in feedback. Would obviously prefer it to be merged. > > I'm probably just being dense here, but I don't quite understand what > all this is trying to accomplish. It looks like it's essentially > trying to set up VLANs (with different rules) over a single physical > network interface, that is still represented to userspace as a single > device with a single IP. Is that right? That's almost what it is, with two differences: 1) there are separated route tables per VLAN, 2) Each VLAN interface (public, cluster) has its own address. With separate route tables, there's a general problem of picking the correct table on outbound connections. > What's the point of doing that with Ceph? Classification & prioritization of ceph network traffic. In our case, prioritization of cluster traffic over client traffic. See my email to Wido. /Martin -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html