On Thu, Sep 22, 2022 at 6:12 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > On 9/22/22 5:12 PM, Alexei Starovoitov wrote: [...] > > Instead of adding 4 bytes for enum in patch 3 > > (which will be 8 bytes due to alignment) > > and abusing bpf_cookie here > > (which struct_ops bpf prog might eventually read and be surprised > > to find sk pointer in there) > > how about adding 'struct task_struct *saved_current' as another arg > > to bpf_tramp_run_ctx ? > > Always store the current task in there in prog_entry_struct_ops > > and then compare it here in this specialized bpf_init_ops_setsockopt? > > > > Or maybe always check in enter_prog_struct_ops: > > if (container_of(current->bpf_ctx, struct bpf_tramp_run_ctx, > > run_ctx)->saved_current == current) // goto out since recursion? > > it will prevent issues in case we don't know about and will > > address the good recursion case as explained in patch 1? > > I'm assuming 2nd ssthresh runs in a different task.. > > Or is it actually the same task? > > The 2nd ssthresh() should run in the same task but different sk. The > first ssthresh(sk[1]) was run in_task() context and then got > interrupted. The softirq then handles the rcv path which just happens > to also call ssthresh(sk[2]) in the unlikely pkt-loss case. It is like > ssthresh(sk[1]) => softirq => ssthresh(sk[2]). > > The tcp-cc ops can recur but cannot recur on the same sk because it > requires to hold the sk lock, so the patch remembers what was the > previous sk to ensure it does not recur on the same sk. Then it needs > to peek into the previous run ctx which may not always be > bpf_trump_run_ctx. eg. a cg bpf prog (with bpf_cg_run_ctx) can call > bpf_setsockopt(TCP_CONGESTION, "a_bpf_tcp_cc") which then will call the > a_bpf_tcp_cc->init(). It needs a bpf_run_ctx_type so it can safely peek > the previous bpf_run_ctx. > > Since struct_ops is the only one that needs to peek into the previous > run_ctx (through tramp_run_ctx->saved_run_ctx), instead of adding 4 > bytes to the bpf_run_ctx, one idea just came to my mind is to use one > bit in the tramp_run_ctx->saved_run_ctx pointer itsef. Something like > this if it reuses the bpf_cookie (probably missed some int/ptr type > casting): > > #define BPF_RUN_CTX_STRUCT_OPS_BIT 1UL > > u64 notrace __bpf_prog_enter_struct_ops(struct bpf_prog *prog, > struct bpf_tramp_run_ctx *run_ctx) > __acquires(RCU) > { > rcu_read_lock(); > migrate_disable(); > > run_ctx->saved_run_ctx = bpf_set_run_ctx((&run_ctx->run_ctx) | > BPF_RUN_CTX_STRUCT_OPS_BIT); > > return bpf_prog_start_time(); > } > > BPF_CALL_5(bpf_init_ops_setsockopt, struct sock *, sk, int, level, > int, optname, char *, optval, int, optlen) > { > /* ... */ > if (unlikely((run_ctx->saved_run_ctx & > BPF_RUN_CTX_STRUCT_OPS_BIT) && ...) { > /* ... */ > if (bpf_cookie == (uintptr_t)sk) > return -EBUSY; > } > > } If I understand correctly, the purpose of adding a field in run_ctx is to tell the enclosing type from a generic bpf_run_ctx. In the following lines: > >> + if (unlikely(run_ctx->saved_run_ctx && > >> + run_ctx->saved_run_ctx->type == BPF_RUN_CTX_TYPE_STRUCT_OPS)) { > >> + saved_run_ctx = (struct bpf_tramp_run_ctx *)run_ctx->saved_run_ctx; the enclosing type of run_ctx->saved_run_ctx is a bpf_tramp_run_ctx, so we can safely type cast it and further check sk. The best way I can come up with is also what Martin thinks, maybe encoding the type information in the lower bits of the saved_run_ctx field. Hao