Re: [PATCH] selinux,xfrm: fix dangling refcount on deferred skb free

Paul Moore <paul@xxxxxxxxxxxxxx> · Thu, 7 Nov 2024 18:50:50 -0500

On Wed, Nov 6, 2024 at 12:00 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> On Wed, Nov 6, 2024 at 5:54 PM Ondrej Mosnacek <omosnace@xxxxxxxxxx> wrote:
> > On Wed, Nov 6, 2024 at 5:13 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> > >
> > > On Wed, Nov 6, 2024 at 4:55 PM Ondrej Mosnacek <omosnace@xxxxxxxxxx> wrote:
> > > >
> > > > SELinux tracks the number of allocated xfrm_state/xfrm_policy objects
> > > > (via the selinux_xfrm_refcount variable) as an input in deciding if peer
> > > > labeling should be used.
> > > >
> > > > However, as a result of commits f35f821935d8 ("tcp: defer skb freeing
> > > > after socket lock is released") and 68822bdf76f1 ("net: generalize skb
> > > > freeing deferral to per-cpu lists"), freeing of a sk_buff object, which
> > > > may hold a reference to an xfrm_state object, can be deferred for
> > > > processing on another CPU core, so even after xfrm_state is deleted from
> > > > the configuration by userspace, the refcount isn't decremented until the
> > > > deferred freeing of relevant sk_buffs happens. On a system with many
> > > > cores this can take a very long time (even minutes or more if the system
> > > > is not very active), leading to peer labeling being enabled for much
> > > > longer than expected.
> > > >
> > > > Fix this by moving the selinux_xfrm_refcount decrementing to just after
> > > > the actual deletion of the xfrm objects rather than waiting for the
> > > > freeing to happen. For xfrm_policy it currently doesn't seem to be
> > > > necessary, but let's do the same there for consistency and
> > > > future-proofing.
> > > >
> > > > We hit this issue on a specific aarch64 256-core system, where the
> > > > sequence of unix_socket/test and inet_socket/tcp/test from
> > > > selinux-testsuite [1] would quite reliably trigger this scenario, and a
> > > > subsequent sctp/test run would then stumble because the policy for that
> > > > test misses some rules that would make it work under peer labeling
> > > > enabled (namely it was getting the netif::egress permission denied in
> > > > some of the test cases).
> > > >
> > > > [1] https://github.com/SELinuxProject/selinux-testsuite/
> > > >
> > > > Fixes: f35f821935d8 ("tcp: defer skb freeing after socket lock is released")
> > > > Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists")
> > > > Signed-off-by: Ondrej Mosnacek <omosnace@xxxxxxxxxx>
> > > > ---
> > >
> > > Can we explain why TCP packets sitting in TCP receive queues would
> > > need to keep xfrm_state around ?
> > >
> > > With thousands of TCP sockets. I would imagine that a similar issue
> > > would be hit,
> > > regardless of f35f821935d8 ("tcp: defer skb freeing after socket lock
> > > is released") and 68822bdf76f1 ("net: generalize skb freeing deferral
> > > to per-cpu lists")
> > >
> > > We remove the dst from these incoming packets (see skb_dst_drop() in
> > > tcp_data_queue() and tcp_add_backlog()),
> > > I do not see how XFRM state could be kept ?
> >
> > The problem is not with the xfrm_state reference via dst_entry, but
> > the one in skb_ext (skb->extensions) -> sec_path
> > (skb_ext_get_ptr(skb->extensions, SKB_EXT_SEC_PATH)) -> the xvec
> > array. But you have a point that I should say that in the commit
> > message...
> >
>
> So some secpath_reset() calls should be added in various protocol handlers
> before packets are possibly queued for hours in a socket queue  ?
>
> I see one in l2tp_eth_dev_recv().

We just need to make sure that skb_sec_path()/SKB_EXT_SEC_PATH is
still valid when the socket filter is run as that is currently where
the LSM hooks into the stack and authorizes a packet to be received on
a sock.

If there are xfrm packets still alive somewhere in the system in a way
that they could be sent or consumed by a task then the SELinux xfrm
reference count should probably still be non-zero, unless of course
we've already done all the access controls.

-- 
paul-moore.com