On Mon, Aug 25, 2014 at 9:41 AM, Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> wrote: > Le 25/08/2014 18:13, Andy Lutomirski a écrit : > >> On Mon, Aug 25, 2014 at 8:43 AM, Nicolas Dichtel >> <nicolas.dichtel@xxxxxxxxx> wrote: >>> >>> Le 25/08/2014 16:04, Andy Lutomirski a écrit : >>> >>>> On Aug 25, 2014 6:30 AM, "Nicolas Dichtel" <nicolas.dichtel@xxxxxxxxx> >>>> wrote: >>>>>> >>>>>> >>>>>> CRIU wants to save the complete state of a namespace and then restore >>>>>> it. For that to work, any information exposed to things in the >>>>>> namespace *cannot* be globally unique or unique per boot, since CRIU >>>>>> needs to arrange for that information to match whatever it was when >>>>>> CRIU saved it. >>>>> >>>>> >>>>> >>>>> How are ifindex of network devices managed? These ifindexes are unique >>>>> per boot, >>>>> thus can change depending on the order in which netdev are created. >>>>> These ifindexes are unique per boot and exposed to userspace ... >>>>> >>>> >>>> This does not appear to be true. >>>> >>>> $ sudo unshare --net >>>> # ip link add veth0 type veth peer name veth1 >>>> # ip link >>>> 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group >>>> default >>>> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 >>>> 2: veth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode >>>> DEFAULT group default qlen 1000 >>>> link/ether 06:0d:59:c7:a6:a8 brd ff:ff:ff:ff:ff:ff >>>> 3: veth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode >>>> DEFAULT group default qlen 1000 >>>> link/ether b2:5c:8b:f2:12:28 brd ff:ff:ff:ff:ff:ff >>>> # logout >>>> $ ip link >>>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN >>>> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 >>>> 3: em1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast >>>> state DOWN qlen 1000 >>>> >>> I've probably misunderstood what you're trying to say. ifindexes are >>> unique >>> per >>> boot and per netns. >> >> >> I think we both misunderstood each other. The ifindexes are unique >> *per netns*, which means that, if you're unprivileged in a netns, >> global information doesn't leak to you. I think this is good. > > Ok, I agree. I think audit daemons are always running under privileged > users. > > >> >>>> >>>> Let me try again, with emphasis in the right place. >>>> >>>> I think that *code running in a namespace* has no business even >>>> knowing a unique identity of *that namespace* from the perspective of >>>> the host. >>>> >>>> In your example, if there's a veth device between netns A and netns B, >>>> then code *in netns A* has no business knowing the identity of its >>>> veth peer if its peer (B) is a sibling or ancestor. It also IMO has >>>> no business knowing the identity of its own netns (A) other than as >>>> "my netns". >>> >>> >>> I do not agree (see the example below). >>> >>> >>>> >>>> If A and B are siblings, then their parent needs to know where that >>>> veth device goes, but I think this is already the case to a sufficient >>>> extent today. >>> >>> >>> I'm not aware of a hierarchy between netns. A daemon should be able to >>> got the full network configuration, even if it's started when this >>> configuration >>> is already applied, ie even if it doesn't know what happen before it >>> starts. >>> >> >> I don't know exactly which namespaces have an explicit hierarchy, but >> there is certainly a hierarchy of *user* namespaces, and network >> namespaces live in user namespaces, so they at least have somewhat of >> a hierarchy. >> >>> >>>> >>>> I feel like this discussion is falling into a common trap of new API >>>> discussions. Can one of you who wants this API please articulate, >>>> with a reasonably precise example, what it is that you want to do, why >>>> you can't easily do it already, and how this API helps? I currently >>>> understand how the API creates problems, but I don't understand how it >>>> solves any problems, and I will NAK it (and I suspect that Eric will, >>>> too, which is pretty much fatal) unless that changes. >>> >>> >>> What I'm trying to solve is to have full info in netlink messages sent by >>> the >>> kernel, thus beeing able to identify a peer netns (and this is close from >>> what >>> audit guys are trying to have). Theorically, messages sent by the kernel >>> can >>> be >>> reused as is to have the same configuration. This is not the case with >>> x-netns >>> devices. Here is an example, with ip tunnels: >>> >>> $ ip netns add 1 >>> $ ip link add ipip1 type ipip remote 10.16.0.121 local 10.16.0.249 dev >>> eth0 >>> $ ip -d link ls ipip1 >>> 8: ipip1@eth0: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode >>> DEFAULT group default >>> link/ipip 10.16.0.249 peer 10.16.0.121 promiscuity 0 >>> ipip remote 10.16.0.121 local 10.16.0.249 dev eth0 ttl inherit >>> pmtudisc >>> $ ip link set ipip1 netns 1 >>> $ ip netns exec 1 ip -d link ls ipip1 >>> 8: ipip1@tunl0: <POINTOPOINT,NOARP,M-DOWN> mtu 1480 qdisc noop state DOWN >>> mode DEFAULT group default >>> link/ipip 10.16.0.249 peer 10.16.0.121 promiscuity 0 >>> ipip remote 10.16.0.121 local 10.16.0.249 dev tunl0 ttl inherit >>> pmtudisc >>> >>> Now informations got with 'ip link' are wrong and incomplete: >>> - the link dev is now tunl0 instead of eth0, because we only got an >>> ifindex >>> from the kernel without any netns informations. >>> - the encapsulation addresses are not part of this netns but the user >>> doesn't >>> known that (still because netns info is missing). These IPv4 >>> addresses >>> may >>> exist into this netns. >>> - it's not possible to create the same netdevice with these infos. >>> >> >> Aha. That's a genuine problem. >> >> Perhaps we need a concept of which netnses should be able to see each >> other. > > Yes, I agree. This is not required for all netns, only a subset of netns > should > > be able to see each other. > >> >> I think I would be okay with a somewhat different outcome from your >> example: >> >> $ ip netns exec 1 ip -d link ls ipip1 >> 8: ipip1@[unknown device in another namespace]: >> <POINTOPOINT,NOARP,M-DOWN> mtu 1480 qdisc noop state DOWN >> >> I think this outcome is mandatory if netns 1 lives in a subsidiary >> user namespace. > > Yes. > > >> >> Certainly, if you do the 'ip link' in the original namespace, I agree >> that this should work. > > And yes :) > > I will update my previous proposal > (http://thread.gmane.org/gmane.linux.network/315933/focus=321753) > to allow to get an id for a peer netns only when the user namespace is the > same. > I think it should work if the peer userns is the same or a descendent. I also wonder whether the peer's ifindex should be suppressed if peer userns is not the same or a descendent. Now you just have to get Eric to be happy with the id allocation. :) This may be nontrivial. > >> >> For most namespace types, this all works transparently, since >> everything has an real identity all the way up the hierarchy. Network >> namespaces are different. >> >> I don't think that exposing serial numbers in /proc is a good >> solution, both for the reasons already described and because I don't >> think that iproute2 should need to muck around with /proc to function > > A netlink API is probably enough. But it will help only for the network > problem, not for audit. I was hoping to find a common solution. I still don't understand why audit needs anything beyond the audit part of this patch set. I have no problem with audit seeing that migrated/restored namespaces are really brand-new namespaces, as long as the code in those namespaces isn't exposed to it. > > >> correctly. Eric, any clever ideas here? Do we need fancier netlink >> messages for this? >> >> --Andy >> > -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html