On 06/18, Stanislav Fomichev wrote: > On 06/18, Alexei Starovoitov wrote: > > On Mon, Jun 17, 2019 at 11:01:01AM -0700, Stanislav Fomichev wrote: > > > Implement new BPF_PROG_TYPE_CGROUP_SOCKOPT program type and > > > BPF_CGROUP_{G,S}ETSOCKOPT cgroup hooks. > > > > > > BPF_CGROUP_SETSOCKOPT get a read-only view of the setsockopt arguments. > > > BPF_CGROUP_GETSOCKOPT can modify the supplied buffer. > > > Both of them reuse existing PTR_TO_PACKET{,_END} infrastructure. > > > > > > The buffer memory is pre-allocated (because I don't think there is > > > a precedent for working with __user memory from bpf). This might be > > > slow to do for each {s,g}etsockopt call, that's why I've added > > > __cgroup_bpf_prog_array_is_empty that exits early if there is nothing > > > attached to a cgroup. Note, however, that there is a race between > > > __cgroup_bpf_prog_array_is_empty and BPF_PROG_RUN_ARRAY where cgroup > > > program layout might have changed; this should not be a problem > > > because in general there is a race between multiple calls to > > > {s,g}etsocktop and user adding/removing bpf progs from a cgroup. > > > > > > The return code of the BPF program is handled as follows: > > > * 0: EPERM > > > * 1: success, execute kernel {s,g}etsockopt path after BPF prog exits > > > * 2: success, do _not_ execute kernel {s,g}etsockopt path after BPF > > > prog exits > > > > > > Note that if 0 or 2 is returned from BPF program, no further BPF program > > > in the cgroup hierarchy is executed. This is in contrast with any existing > > > per-cgroup BPF attach_type. > > > > This is drastically different from all other cgroup-bpf progs. > > I think all programs should be executed regardless of return code. > > It seems to me that 1 vs 2 difference can be expressed via bpf program logic > > instead of return code. > > > > How about we do what all other cgroup-bpf progs do: > > "any no is no. all yes is yes" > > Meaning any ret=0 - EPERM back to user. > > If all are ret=1 - kernel handles get/set. > > > > I think the desire to differentiate 1 vs 2 came from ordering issue > > on getsockopt. > > How about for setsockopt all progs run first and then kernel. > > For getsockopt kernel runs first and then all progs. > > Then progs will have an ability to overwrite anything the kernel returns. > Good idea, makes sense. For getsockopt we'd also need to pass the return > value of the kernel getsockopt to let bpf programs override it, but seems > doable. Let me play with it a bit; I'll send another version if nothing > major comes up. > > Thanks for another round of review! One clarification: we'd still probably need to have 3 return codes for setsockopt: * any 0 - EPERM * all 1 - continue with the kernel path (i.e. apply this sockopt as is) * any 2 - return after all BPF hooks are executed (bypass kernel) (any 0 trumps any 2 -> EPERM) The context is readonly for setsockopt, so it shouldn't be an issue. Let me know if you have better idea how to handle that.