On Mon, Dec 7, 2015 at 5:36 AM, Martin Millnert <martin@xxxxxxxxxxx> wrote: > Greg, > > see below. > > On Thu, 2015-12-03 at 13:25 -0800, Gregory Farnum wrote: >> On Thu, Dec 3, 2015 at 12:13 PM, Martin Millnert <martin@xxxxxxxxxxx> wrote: >> > Hi, >> > >> > we're deploying Ceph on Linux for multiple purposes. >> > We want to build network isolation in our L3 DC network using VRF:s. >> > >> > In the case of Ceph this means that we are separating the Ceph public >> > network from the Ceph cluster network, in this manner, into separate >> > network routing domains (for those who do not know what a VRF is). >> > >> > Furthermore, we're also running (per-VRF) dynamically routed L3 all the >> > way to the hosts (OSPF from ToR switch), and need to separate route >> > tables on the hosts properly. This is done using "ip rule" today. >> > We use VLANs to separate the VRF:s from each other between ToR and >> > hosts, so there is no problem to determine which VRF an incoming packet >> > to a host belongs to (iif $dev). >> > >> > The problem is selecting the proper route table for outbound packets >> > from the host. >> > >> > There is current work in progress for a redesign [1] of the old VRF [2] >> > design in the Linux Kernel. At least in the new design, there is an >> > intended way of placing processes within a VRF such that, similar to >> > network namespaces, the processes are unaware that they are in fact >> > living within a VRF. >> > >> > This would work for a process such as the 'mon', which only lives in the >> > public network. >> > >> > But it doesn't work for the OSD, which uses separate sockets for public >> > and cluster networks. >> > >> > There is however a real simple solution: >> > 1. Use something similar to >> > setsockopt(sockfd, SOL_SOCKET, SO_MARK, puborclust_val, sizeof(one)) >> > (untested) >> > 2. set up "ip rule" for outbound traffic to select an appropriate route >> > table based on the MARK value of "puborclust_val" above. >> > >> > AFAIK BSD doesn't have SO_MARK specifically, but this is a quite simple >> > option that adds a lot of utility for us, and, I imagine others. >> > >> > I'm willing to write it and test it too. But before doing that, I'm >> > interested in feedback. Would obviously prefer it to be merged. >> >> I'm probably just being dense here, but I don't quite understand what >> all this is trying to accomplish. It looks like it's essentially >> trying to set up VLANs (with different rules) over a single physical >> network interface, that is still represented to userspace as a single >> device with a single IP. Is that right? > > That's almost what it is, with two differences: > 1) there are separated route tables per VLAN, > 2) Each VLAN interface (public, cluster) has its own address. Okay, but if each interface has its own interface, why do you need Ceph to do anything at all? You can specify the public and cluster addresses, they'll bind to the appropriate interface, and then you can do stuff based on the interface/VLAN it's part of. Right? -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html