Re: [PATCH bpf-next v3 1/2] net: bpf: Always call BPF cgroup filters for egress.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 6/22/23 13:06, Daniel Borkmann wrote:
On 6/22/23 8:28 PM, Yonghong Song wrote:
On 6/22/23 10:15 AM, Kui-Feng Lee wrote:
On 6/21/23 20:37, Yonghong Song wrote:
On 6/20/23 10:14 AM, Kui-Feng Lee wrote:
Always call BPF filters if CGROUP BPF is enabled for EGRESS without
checking skb->sk against sk.

The filters were called only if skb is owned by the sock that the
skb is sent out through.  In another words, skb->sk should point to
the sock that it is sending through its egress.  However, the filters would miss SYNACK skbs that they are owned by a request_sock but sent through
the listening sock, that is the socket listening incoming connections.
This is an unnecessary restrict.

The original patch which introduced 'sk == skb->sk' is
   3007098494be  cgroup: add support for eBPF programs
There are no mentioning in commit message why 'sk == skb->sk'
is needed. So it is possible that this is just restricted
for use cases at that moment. Now there are use cases
where 'sk != skb->sk' so removing this check can enable
the new use case. Maybe you can add this into your commit
message so people can understand the history of 'sk == skb->sk'.

After checking the code and the Alexei's comment[1] again, this check
may be different from what I thought. In another post[2],
Daniel Borkmann mentioned

     Wouldn't that mean however, when you go through stacked devices that
     you'd run the same eBPF cgroup program for skb->sk multiple times?

I read this paragraph several times.
This check ensures the filters are only called for the device on
the top of a stack.  So, I probably should change the check to

     sk == skb_to_full_sk(skb)

I think this should work. It exactly covers your use case:
   they are owned by a request_sock but sent through
   the listening sock, that is the socket listening incoming connections
and sk == skb->sk for non request_sock/listening_sock case.

Just a thought, should the test look like the below?

        int __ret = 0;                                                         \         if (cgroup_bpf_enabled(CGROUP_INET_EGRESS) && sk) {                    \                 typeof(sk) __sk = sk_to_full_sk(sk);                           \                 if (sk_fullsock(__sk) && __sk == skb_to_full_sk(skb) &&        \                     cgroup_bpf_sock_enabled(__sk, CGROUP_INET_EGRESS))         \                         __ret = __cgroup_bpf_run_filter_skb(__sk, skb,         \ CGROUP_INET_EGRESS); \ }                                                                      \

Iow, we do already convert __sk to full sk, so we should then also use that
for the test with skb_to_full_sk(skb).

Agree!


Thanks,
Daniel




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux