On Wed, May 27, 2020 at 07:48 PM CEST, sdf@xxxxxxxxxx wrote: > On 05/27, Jakub Sitnicki wrote: >> Add support for bpf() syscall subcommands that operate on >> bpf_link (LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO) for attach points tied to >> network namespaces (that is flow dissector at the moment). > >> Link-based and prog-based attachment can be used interchangeably, but only >> one can be in use at a time. Attempts to attach a link when a prog is >> already attached directly, and the other way around, will be met with >> -EBUSY. > >> Attachment of multiple links of same attach type to one netns is not >> supported, with the intention to lift it when a use-case presents >> itself. Because of that attempts to create a netns link, when one already >> exists result in -E2BIG error, signifying that there is no space left for >> another attachment. > >> Link-based attachments to netns don't keep a netns alive by holding a ref >> to it. Instead links get auto-detached from netns when the latter is being >> destroyed by a pernet pre_exit callback. > >> When auto-detached, link lives in defunct state as long there are open FDs >> for it. -ENOLINK is returned if a user tries to update a defunct link. > >> Because bpf_link to netns doesn't hold a ref to struct net, special care is >> taken when releasing the link. The netns might be getting torn down when >> the release function tries to access it to detach the link. > >> To ensure the struct net object is alive when release function accesses it >> we rely on the fact that cleanup_net(), struct net destructor, calls >> synchronize_rcu() after invoking pre_exit callbacks. If auto-detach from >> pre_exit happens first, link release will not attempt to access struct net. > >> Same applies the other way around, network namespace doesn't keep an >> attached link alive because by not holding a ref to it. Instead bpf_links >> to netns are RCU-freed, so that pernet pre_exit callback can safely access >> and auto-detach the link when racing with link release/free. > > [..] >> + rcu_read_lock(); >> for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) { >> - if (rcu_access_pointer(net->bpf.progs[type])) >> + if (rcu_access_pointer(net->bpf.links[type])) >> + bpf_netns_link_auto_detach(net, type); >> + else if (rcu_access_pointer(net->bpf.progs[type])) >> __netns_bpf_prog_detach(net, type); >> } >> + rcu_read_unlock(); > Aren't you doing RCU_INIT_POINTER in __netns_bpf_prog_detach? > Is it allowed under rcu_read_load? Yes, that's true. __netns_bpf_prog_detach does RCU_INIT_POINTER(net->bpf.progs[type], NULL); RCU read lock is here for the rcu_dereference() that happens in bpf_netns_link_auto_detach (netns doesn't hold a ref to bpf_link): /* Called with RCU read lock. */ static void __net_exit bpf_netns_link_auto_detach(struct net *net, enum netns_bpf_attach_type type) { struct bpf_netns_link *net_link; struct bpf_link *link; link = rcu_dereference(net->bpf.links[type]); if (!link) return; net_link = to_bpf_netns_link(link); RCU_INIT_POINTER(net_link->net, NULL); } I've pulled it up, out of the loop, perhaps too eagerly and just made it confusing, considering we're iterating over a 1-item array :-) Now, I'm also doing RCU_INIT_POINTER on the *contents of bpf_link* in bpf_netns_link_auto_detach. Is that allowed? I'm not sure, that bit is hazy to me. There are no concurrent writers to net_link->net, just readers, i.e. bpf_netns_link_release(). And I know bpf_link won't be freed until the grace period elapses. sparse and CONFIG_PROVE_RCU are not shouting at me, but please do if it doesn't hold up or make sense. I certainly can push the rcu_read_lock() down into bpf_netns_link_auto_detach(). -jkbs