On Wed, Jul 27, 2022 at 2:21 PM Martin KaFai Lau <kafai@xxxxxx> wrote: > > On Wed, Jul 27, 2022 at 01:39:08PM -0700, Stanislav Fomichev wrote: > > On Wed, Jul 27, 2022 at 11:37 AM Martin KaFai Lau <kafai@xxxxxx> wrote: > > > > > > On Wed, Jul 27, 2022 at 09:47:25AM -0700, sdf@xxxxxxxxxx wrote: > > > > On 07/26, Martin KaFai Lau wrote: > > > > > Most of the codes in bpf_setsockopt(SOL_SOCKET) are duplicated from > > > > > the sock_setsockopt(). The number of supported options are > > > > > increasing ever and so as the duplicated codes. > > > > > > > > > One issue in reusing sock_setsockopt() is that the bpf prog > > > > > has already acquired the sk lock. sockptr_t is useful to handle this. > > > > > sockptr_t already has a bit 'is_kernel' to handle the kernel-or-user > > > > > memory copy. This patch adds a 'is_bpf' bit to tell if sk locking > > > > > has already been ensured by the bpf prog. > > > > > > > > Why not explicitly call it is_locked/is_unlocked? I'm assuming, at some > > > > point, > > > is_locked was my initial attempt. The bpf_setsockopt() also skips > > > the ns_capable() check, like in patch 3. I ended up using > > > one is_bpf bit here to do both. > > > > Yeah, sorry, I haven't read the whole series before I sent my first > > reply. Let's discuss it here. > > > > This reminds me of ns_capable in __inet_bind where we also had to add > > special handling. > > > > In general, not specific to the series, I wonder if we want some new > > in_bpf() context indication and bypass ns_capable() from those > > contexts? > > Then we can do things like: > > > > if (sk->sk_bound_dev_if && !in_bpf() && !ns_capable(net->user_ns, > > CAP_NET_RAW)) > > return ...; > Don't see a way to implement in_bpf() after some thoughts. > Do you have idea ? I wonder if we can cheat a bit with the following: bool setsockopt_capable(struct user_namespace *ns, int cap) { if (!in_task()) { /* Running in irq/softirq -> setsockopt invoked by bpf program. * [not sure, is it safe to assume no regular path leads to setsockopt from sirq?] */ return true; } /* Running in process context, task has bpf_ctx set -> invoked by bpf program. */ if (current->bpf_ctx != NULL) return true; return ns_capable(ns, cap); } And then do /ns_capable/setsockopt_capable/ in net/core/sock.c But that might be more fragile than passing the flag, idk. > > Or would it make things more confusing? > > > > > > > > > > we can have code paths in bpf where the socket has been already locked by > > > > the stack? > > > hmm... You meant the opposite, like the bpf hook does not have the > > > lock pre-acquired before the bpf prog gets run and sock_setsockopt() > > > should do lock_sock() as usual? > > > > > > I was thinking a likely situation is a bpf 'sleepable' hook does not > > > have the lock pre-acquired. In that case, the bpf_setsockopt() could > > > always acquire the lock first but it may turn out to be too > > > pessmissitic for the future bpf_[G]etsockopt() refactoring. > > > > > > or we could do this 'bit' break up (into one is_locked bit > > > for locked and one is_bpf to skip-capable-check). I was waiting until a real > > > need comes up instead of having both bits always true now. I don't mind to > > > add is_locked now since the bpf_lsm_cgroup may come to sleepable soon. > > > I can do this in the next spin.