On Wed, Nov 6, 2024 at 4:55 PM Ondrej Mosnacek <omosnace@xxxxxxxxxx> wrote: > > SELinux tracks the number of allocated xfrm_state/xfrm_policy objects > (via the selinux_xfrm_refcount variable) as an input in deciding if peer > labeling should be used. > > However, as a result of commits f35f821935d8 ("tcp: defer skb freeing > after socket lock is released") and 68822bdf76f1 ("net: generalize skb > freeing deferral to per-cpu lists"), freeing of a sk_buff object, which > may hold a reference to an xfrm_state object, can be deferred for > processing on another CPU core, so even after xfrm_state is deleted from > the configuration by userspace, the refcount isn't decremented until the > deferred freeing of relevant sk_buffs happens. On a system with many > cores this can take a very long time (even minutes or more if the system > is not very active), leading to peer labeling being enabled for much > longer than expected. > > Fix this by moving the selinux_xfrm_refcount decrementing to just after > the actual deletion of the xfrm objects rather than waiting for the > freeing to happen. For xfrm_policy it currently doesn't seem to be > necessary, but let's do the same there for consistency and > future-proofing. > > We hit this issue on a specific aarch64 256-core system, where the > sequence of unix_socket/test and inet_socket/tcp/test from > selinux-testsuite [1] would quite reliably trigger this scenario, and a > subsequent sctp/test run would then stumble because the policy for that > test misses some rules that would make it work under peer labeling > enabled (namely it was getting the netif::egress permission denied in > some of the test cases). > > [1] https://github.com/SELinuxProject/selinux-testsuite/ > > Fixes: f35f821935d8 ("tcp: defer skb freeing after socket lock is released") > Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists") > Signed-off-by: Ondrej Mosnacek <omosnace@xxxxxxxxxx> > --- Can we explain why TCP packets sitting in TCP receive queues would need to keep xfrm_state around ? With thousands of TCP sockets. I would imagine that a similar issue would be hit, regardless of f35f821935d8 ("tcp: defer skb freeing after socket lock is released") and 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists") We remove the dst from these incoming packets (see skb_dst_drop() in tcp_data_queue() and tcp_add_backlog()), I do not see how XFRM state could be kept ?