On Wed, Aug 17, 2022 at 4:27 PM Martin KaFai Lau <kafai@xxxxxx> wrote: > > On Wed, Aug 17, 2022 at 03:41:26PM -0700, Stanislav Fomichev wrote: > > On Wed, Aug 17, 2022 at 12:07 PM Martin KaFai Lau <kafai@xxxxxx> wrote: > > > > > > On Tue, Aug 16, 2022 at 01:12:11PM -0700, Stanislav Fomichev wrote: > > > > Apparently, only a small subset of cgroup hooks actually falls > > > > back to cgroup_base_func_proto. This leads to unexpected result > > > > where not all cgroup helpers have bpf_{g,s}et_retval. > > > > > > > > It's getting harder and harder to manage which helpers are exported > > > > to which hooks. We now have the following call chains: > > > > > > > > - cg_skb_func_proto > > > > - sk_filter_func_proto > > > > - bpf_sk_base_func_proto > > > > - bpf_base_func_proto > > > Could you explain how bpf_set_retval() will work with cgroup prog that > > > is not syscall and can return flags in the higher bit (e.g. cg_skb egress). > > > It will be a useful doc to add to the uapi bpf.h for > > > the bpf_set_retval() helper. > > > > I think it's the same case as the case without bpf_set_retval? I don't > > think the flags can be exported via bpf_set_retval, it just lets the > > users override EPERM. > eg. Before, a cg_skb@egress prog returns 3 to mean NET_XMIT_CN. > What if the prog now returns 3 and also bpf_set_retval(-Exxxx). > If I read how __cgroup_bpf_run_filter_skb() uses bpf_prog_run_array_cg() > correctly, __cgroup_bpf_run_filter_skb() will return NET_XMIT_DROP > instead of the -Exxxx. The -Exxxx is probably what the bpf prog > is expecting after calling bpf_set_retval(-Exxxx) ? > Thinking more about it, should __cgroup_bpf_run_filter_skb() always > return -Exxxx whenever a -ve retval is set in bpf_set_retval() ? I think we used to have "return 0/1/2/3" to indicate different conditions but then switched to "return 1/0" + flags. So, technically, "return 3 + bpf_set_retval" is still fundamentally a "return 3" api-wise. I guess we can make bpf_set_retval override that but let me start by trying to document what we currently have. If it turns out to be super ugly, we can try to fix it. (not sure how much of a uapi that is) > > Let me verify and I can add a note to bpf_set_retval uapi definition > > to mention that it just overrides EPERM. bpf_set_retval should > > probably not talk about userspace/syscall and instead use the words > > like "caller". > yeah, it is no longer syscall return value only now.