On Thu, Oct 27, 2022 at 10:29 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > On 10/27/22 9:23 AM, Stanislav Fomichev wrote: > > On Wed, Oct 26, 2022 at 11:15 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > >> > >> On 10/26/22 7:03 PM, Stanislav Fomichev wrote: > >>> On Wed, Oct 26, 2022 at 6:14 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > >>>> > >>>> The cgroup-bpf {get,set}sockopt prog is useful to change the optname behavior. > >>>> The bpf prog usually just handles a few specific optnames and ignores most > >>>> others. For the optnames that it ignores, it usually does not need to change > >>>> the optlen. The exception is when optlen > PAGE_SIZE (or optval_end - optval). > >>>> The bpf prog needs to set the optlen to 0 for this case or else the kernel will > >>>> return -EFAULT to the userspace. It is usually not what the bpf prog wants > >>>> because the bpf prog only expects error returning to userspace when it has > >>>> explicitly 'return 0;' or used bpf_set_retval(). If a bpf prog always changes > >>>> optlen for optnames that it does not care to 0, it may risk if the latter bpf > >>>> prog in the same cgroup may want to change/look-at it. > >>>> > >>>> Would like to explore if there is an easier way for the bpf prog to handle it. > >>>> eg. does it make sense to track if the bpf prog has changed the ctx->optlen > >>>> before returning -EFAULT to the user space when ctx.optlen > max_optlen? > >>> > >>> Good point on chaining being broken because of this requirement :-/ > >>> > >>> With tracking, we need to be careful, because the following situation > >>> might be problematic: > >>> Suppose setsockopt is larger than 4k, the program can rewrite some > >>> byte in the first 4k, not touch optlen and expect this to work. > >> > >> If the bpf prog rewrites the first 4k, it must change the ctx.optlen to get it > >> work. Otherwise, the kernel will return -EFAULT because the ctx.optlen is > >> larger than the max_optlen (or optval_end - optval). > >> > >>> Currently, optlen=0 explicitly means "ignore whatever is in the bpf > >>> buffer and use the original one" > If we can have a tracking that catches situations like this - we > >>> should be able to drop that optlen=0 requirement. > >>> IIRC, that's the only tricky part. > >> > >> Ah, I meant, in __cgroup_bpf_run_filter_setsockopt, use "!ctx.optlen_changed && > >> ctx.optlen > max_optlen" test to imply "ignore whatever is in the bpf > >> buffer and use the original one". Add 'bool optlen_changed' to 'struct > >> bpf_sockopt_kern' and set ctx.optlen_changed to true in > >> cg_sockopt_convert_ctx_access() whenever there is BPF_WRITE to ctx.optlen. > >> Would it work or may be I am still missing something in the writing first 4k > >> case above? > > > > What if the program wants to keep optlen as is? Here is the > > hypothetical case: ctx->optlen is 8k, we allocate/expose only the > > first 4k, the program does ctx->optval[0] = 0xff and doesn't change > > the optlen. It wants the rest of the payload to be passed as is with > > only the first byte changed. > > I think we are talking about the same case but we may have different > understanding on how the current __cgroup_bpf_run_filter_setsockopt() is > handling it. > > I don't see the current kernel supports this now. If the bpf prog does not > change the ctx->optlen from 8k to something that is <= 4k, the kernel will just > return -EFAULT in here, no? > else if (ctx.optlen /* 8k */ > max_optlen /* 4k */ || ctx.optlen < -1) { > /* optlen is out of bounds */ > ret = -EFAULT; > } > > or you meant the future change needs to consider this new case and also support > gluing the first 4k (that was exposed to the bpf prog) with the second 4k (that > was not exposed to the bpf prog)? > > > The condition "!ctx.optlen_changed && ctx.optlen > max_optlen" is > > true, so, if we treat this as explicit optlen=0, we ignore the > > program's changes. > > But this is not what the program has intended, right? It wants to > > amend something and pass the rest as is. Right, I'm not talking about how it's handled now. Now optlen > max_optlen triggers EFAULT. But in the future, if we add tracking, we want 'optlen > max_optlen' to behave as explicit 'optlen = 0' as long as the user hasn't changed the optlen _and_ also hasn't changed anything in the buffer. > > It seems like we need to have both optlen_changed and optval_changed. > > If both are false, we should be able to safely do optlen=0 equivalent. > > Tracking only optlen seems to be problematic? >