On 10/02, Andrii Nakryiko wrote: > On Wed, Oct 2, 2019 at 10:35 AM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote: > > > > Always use init_net flow dissector BPF program if it's attached and fall > > back to the per-net namespace one. Also, deny installing new programs if > > there is already one attached to the root namespace. > > Users can still detach their BPF programs, but can't attach any > > new ones (-EPERM). > > > > Cc: Petar Penkov <ppenkov@xxxxxxxxxx> > > Signed-off-by: Stanislav Fomichev <sdf@xxxxxxxxxx> > > --- > > Documentation/bpf/prog_flow_dissector.rst | 3 +++ > > net/core/flow_dissector.c | 11 ++++++++++- > > 2 files changed, 13 insertions(+), 1 deletion(-) > > > > diff --git a/Documentation/bpf/prog_flow_dissector.rst b/Documentation/bpf/prog_flow_dissector.rst > > index a78bf036cadd..4d86780ab0f1 100644 > > --- a/Documentation/bpf/prog_flow_dissector.rst > > +++ b/Documentation/bpf/prog_flow_dissector.rst > > @@ -142,3 +142,6 @@ BPF flow dissector doesn't support exporting all the metadata that in-kernel > > C-based implementation can export. Notable example is single VLAN (802.1Q) > > and double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys`` > > for a set of information that's currently can be exported from the BPF context. > > + > > +When BPF flow dissector is attached to the root network namespace (machine-wide > > +policy), users can't override it in their child network namespaces. > > diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c > > index 7c09d87d3269..494e2016fe84 100644 > > --- a/net/core/flow_dissector.c > > +++ b/net/core/flow_dissector.c > > @@ -115,6 +115,11 @@ int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr, > > struct bpf_prog *attached; > > struct net *net; > > > > + if (rcu_access_pointer(init_net.flow_dissector_prog)) { > > + /* Can't override root flow dissector program */ > > + return -EPERM; > > + } > > This is racy, shouldn't this be checked after grabbing a lock below? What kind of race do you have in mind? Even if I put this check under the mutex, it's still possible that if two cpus concurrently start attaching flow dissector programs (i.e. call sys_bpf(BPF_PROG_ATTACH)) at the same time (one to root ns, the other to non-root ns), the cpu that is attaching to non-root can grab mutex first, pass all the checks and attach the prog (higher frequency, tubo boost, etc). The mutex is there to protect only against concurrent attaches to the _same_ netns. For the sake of simplicity we have a global one instead of a mutex per net-ns. So I'd rather not grab the mutex and keep it simple. Even in there is a race, in __skb_flow_dissect we always check init_net first. > > + > > net = current->nsproxy->net_ns; > > mutex_lock(&flow_dissector_mutex); > > attached = rcu_dereference_protected(net->flow_dissector_prog, > > @@ -910,7 +915,11 @@ bool __skb_flow_dissect(const struct net *net, > > WARN_ON_ONCE(!net); > > if (net) { > > rcu_read_lock(); > > - attached = rcu_dereference(net->flow_dissector_prog); > > + attached = > > + rcu_dereference(init_net.flow_dissector_prog); > > + > > + if (!attached) > > + attached = rcu_dereference(net->flow_dissector_prog); > > > > if (attached) { > > struct bpf_flow_keys flow_keys; > > -- > > 2.23.0.444.g18eeb5a265-goog > >