Re: [PATCH bpf-next 3/6] bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen

Martin KaFai Lau <martin.lau@xxxxxxxxx> · Tue, 25 Apr 2023 11:31:27 -0700

On 4/25/23 10:12 AM, Stanislav Fomichev wrote:
On Mon, Apr 24, 2023 at 5:56 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:

On 4/21/23 9:09 AM, Stanislav Fomichev wrote:
On Fri, Apr 21, 2023 at 8:24 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:

On 4/19/23 12:53 AM, Stanislav Fomichev wrote:
Over time, we've found out several special socket option cases which need
special treatment. And if BPF program doesn't handle them correctly, this
might EFAULT perfectly valid {g,s}setsockopt calls.

The intention of the EFAULT was to make it apparent to the
developers that the program is doing something wrong.
However, this inadvertently might affect production workloads
with the BPF programs that are not too careful.

Took in the first two for now. It would be good if the commit description
in here could have more details for posterity given this is too vague.

Thanks! Will try to repost next week with more details.

Let's try to minimize the chance of BPF program screwing up userspace
by ignoring the output of those BPF programs (instead of returning
EFAULT to the userspace). pr_info_ratelimited those cases to
the dmesg to help with figuring out what's going wrong.

Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
Suggested-by: Martin KaFai Lau <martin.lau@xxxxxxxxxx>
Signed-off-by: Stanislav Fomichev <sdf@xxxxxxxxxx>
---
    kernel/bpf/cgroup.c | 8 ++++++--
    1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index a06e118a9be5..af4d20864fb4 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1826,7 +1826,9 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
                ret = 1;
        } else if (ctx.optlen > max_optlen || ctx.optlen < -1) {
                /* optlen is out of bounds */
-             ret = -EFAULT;
+             pr_info_ratelimited(
+                     "bpf setsockopt returned unexpected optlen=%d (max_optlen=%d)\n",
+                     ctx.optlen, max_optlen);

Does it help any regular user if this log message is seen? I kind of doubt it a bit,
it might create more confusion if log gets spammed with it, imo.

Agreed, but we need some way to let the users know that their bpf
program is doing the wrong thing (if they set the optlen too high for
example).

imo, I also think a printk here will be a noise in dmesg most of the time (but
much better than an unexpected -EFAULT).

I was thinking for a v2, maybe print it at least once? Similar to
current bpf_warn_invalid_xdp_action?


Any other better alternatives to expose those events?

Is it possible to only -EFAULT when the bpf prog setting a ctx.optlen larger
than the "original" user provided optlen?

Nevermind the "ctx.optlen larger than the original user provided optlen" part. I 
mis-read something in __cgroup_bpf_run_filter_getsockopt(). max_optlen is the 
right limit that the kernel needs to bound the ctx.optlen written by bpf prog.

Ignore for all other cases that is due to the current PAGE_SIZE limitation?

Should be possible. That "ctx.optlen > PAGE_SIZE && ctx.optlen <
original_optlen" is the condition where we'd silently ignore BPF
output.

and should the -ve ctx.optlen be treated separately? like in 
__cgroup_bpf_run_filter_getsockopt():

	if (ctx.optlen < 0) {
		ret = -EFAULT;
		goto out;
	}
	
	if (optval && ctx.optlen > max_optlen) {
		ret = original_optlen > PAGE_SIZE ? 0 : -EFAULT;
		goto out;
	}

As per above, I'll stick a line to the dmest (similar
bpf_warn_invalid_xdp_action), at least to record that this has
happened once.
LMK if you or Danial still don't see a value in printing this..

pr_info_once? hmm... I think it is ok-ish. At least not a warning.

I think almost all of the time the bpf prog forgets to set it to 0 for the long 
optval that it has no interest in. However, to suppress this pr_info_once, 
setting optlen to 0 will disable the following cgroup-bpf prog from using the 
optval as read-only. The case that the printk that may flag is that the bpf prog 
did indeed want to change the long optval?