Re: [PATCH bpf-next 1/2] bpf/flow_dissector: add mode to enforce global BPF flow dissector

Stanislav Fomichev <sdf@xxxxxxxxxxx> · Wed, 2 Oct 2019 18:43:56 -0700

On 10/02, Andrii Nakryiko wrote:
> On Wed, Oct 2, 2019 at 10:35 AM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote:
> >
> > Always use init_net flow dissector BPF program if it's attached and fall
> > back to the per-net namespace one. Also, deny installing new programs if
> > there is already one attached to the root namespace.
> > Users can still detach their BPF programs, but can't attach any
> > new ones (-EPERM).
> >
> > Cc: Petar Penkov <ppenkov@xxxxxxxxxx>
> > Signed-off-by: Stanislav Fomichev <sdf@xxxxxxxxxx>
> > ---
> >  Documentation/bpf/prog_flow_dissector.rst |  3 +++
> >  net/core/flow_dissector.c                 | 11 ++++++++++-
> >  2 files changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/bpf/prog_flow_dissector.rst b/Documentation/bpf/prog_flow_dissector.rst
> > index a78bf036cadd..4d86780ab0f1 100644
> > --- a/Documentation/bpf/prog_flow_dissector.rst
> > +++ b/Documentation/bpf/prog_flow_dissector.rst
> > @@ -142,3 +142,6 @@ BPF flow dissector doesn't support exporting all the metadata that in-kernel
> >  C-based implementation can export. Notable example is single VLAN (802.1Q)
> >  and double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys``
> >  for a set of information that's currently can be exported from the BPF context.
> > +
> > +When BPF flow dissector is attached to the root network namespace (machine-wide
> > +policy), users can't override it in their child network namespaces.
> > diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> > index 7c09d87d3269..494e2016fe84 100644
> > --- a/net/core/flow_dissector.c
> > +++ b/net/core/flow_dissector.c
> > @@ -115,6 +115,11 @@ int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
> >         struct bpf_prog *attached;
> >         struct net *net;
> >
> > +       if (rcu_access_pointer(init_net.flow_dissector_prog)) {
> > +               /* Can't override root flow dissector program */
> > +               return -EPERM;
> > +       }
> 
> This is racy, shouldn't this be checked after grabbing a lock below?
What kind of race do you have in mind?

Even if I put this check under the mutex, it's still possible that if
two cpus concurrently start attaching flow dissector programs (i.e. call
sys_bpf(BPF_PROG_ATTACH)) at the same time (one to root ns, the other
to non-root ns), the cpu that is attaching to non-root can grab mutex first,
pass all the checks and attach the prog (higher frequency, tubo boost, etc).

The mutex is there to protect only against concurrent attaches to the
_same_ netns. For the sake of simplicity we have a global one instead
of a mutex per net-ns.

So I'd rather not grab the mutex and keep it simple. Even in there is a
race, in __skb_flow_dissect we always check init_net first.

> > +
> >         net = current->nsproxy->net_ns;
> >         mutex_lock(&flow_dissector_mutex);
> >         attached = rcu_dereference_protected(net->flow_dissector_prog,
> > @@ -910,7 +915,11 @@ bool __skb_flow_dissect(const struct net *net,
> >         WARN_ON_ONCE(!net);
> >         if (net) {
> >                 rcu_read_lock();
> > -               attached = rcu_dereference(net->flow_dissector_prog);
> > +               attached =
> > +                       rcu_dereference(init_net.flow_dissector_prog);
> > +
> > +               if (!attached)
> > +                       attached = rcu_dereference(net->flow_dissector_prog);
> >
> >                 if (attached) {
> >                         struct bpf_flow_keys flow_keys;
> > --
> > 2.23.0.444.g18eeb5a265-goog
> >