On Wed, Jun 15, 2022 at 9:45 AM Maciej Żenczykowski <zenczykowski@xxxxxxxxx> wrote: > > Are you folks aware that: > > 'bpf: Move rcu lock management out of BPF_PROG_RUN routines' > > fixes a weird regression where sendmsg with an egress tc bpf program > denying it was returning EFAULT instead of EPERM > > I've confirmed vanilla 5.18.0 is broken, and all it takes is > cherrypicking that specific stable 5.18.x patch [ > 710a8989b4b4067903f5b61314eda491667b6ab3 ] to fix behaviour. > > This was not a flaky failure... but a 100% reproducible behavioural > breakage/failure in the test case at > https://cs.android.com/android/platform/superproject/+/master:kernel/tests/net/test/bpf_test.py;l=517 > (where 5.18 would return EFAULT instead of EPERM) I bisected on 5.18.x to find the fixing CL, so I don't know which CL actually caused the breakage. sdf says: 5.15 is where they rewrote defines to funcs, so there is still something else involved it seems b8bd3ee1971d1edbc53cf322c149ca0227472e56 this is where we added EFAULT in 5.16 (we've added a mechanism to return custom errno, I wonder if some of that is related) and that this EFAULT breakage is not something he was expecting to fix... so it's some sort of unintended consequence. I recall that: - vanilla 5.15 and 5.16 are definitely good - I think the only regression in 5.17 is an unrelated icmp socket one - so from a bpf perspective it was also good. - 5.18 had 3 regressions: icmp sockets, the pf_key regression (fixed via revert in 5.18.4) plus this bpf one The bad pf_key change being reverted in 5.18.4 is why I even switched from dev/test against 5.18 to against 5.18.4 and noticed that this was already fixed before I could even report it...