Re: [Question]: BPF_CGROUP_{GET,SET}SOCKOPT handling when optlen > PAGE_SIZE

Stanislav Fomichev <sdf@xxxxxxxxxx> · Wed, 26 Oct 2022 19:03:52 -0700

On Wed, Oct 26, 2022 at 6:14 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
>
> The cgroup-bpf {get,set}sockopt prog is useful to change the optname behavior.
> The bpf prog usually just handles a few specific optnames and ignores most
> others.  For the optnames that it ignores, it usually does not need to change
> the optlen.  The exception is when optlen > PAGE_SIZE (or optval_end - optval).
> The bpf prog needs to set the optlen to 0 for this case or else the kernel will
> return -EFAULT to the userspace.  It is usually not what the bpf prog wants
> because the bpf prog only expects error returning to userspace when it has
> explicitly 'return 0;' or used bpf_set_retval().  If a bpf prog always changes
> optlen for optnames that it does not care to 0,  it may risk if the latter bpf
> prog in the same cgroup may want to change/look-at it.
>
> Would like to explore if there is an easier way for the bpf prog to handle it.
> eg. does it make sense to track if the bpf prog has changed the ctx->optlen
> before returning -EFAULT to the user space when ctx.optlen > max_optlen?

Good point on chaining being broken because of this requirement :-/

With tracking, we need to be careful, because the following situation
might be problematic:
Suppose setsockopt is larger than 4k, the program can rewrite some
byte in the first 4k, not touch optlen and expect this to work.
Currently, optlen=0 explicitly means "ignore whatever is in the bpf
buffer and use the original one".
If we can have a tracking that catches situations like this - we
should be able to drop that optlen=0 requirement.
IIRC, that's the only tricky part.