On Tue, May 14, 2019 at 10:30:02AM -0700, Stanislav Fomichev wrote: > On 05/14, Alexei Starovoitov wrote: > > On Mon, May 13, 2019 at 11:57 AM Stanislav Fomichev <sdf@xxxxxxxxxxx> wrote: > > > > > > On 05/08, Stanislav Fomichev wrote: > > > > On 05/08, Alexei Starovoitov wrote: > > > > > On Wed, May 08, 2019 at 10:18:41AM -0700, Stanislav Fomichev wrote: > > > > > > Right now we are not using rcu api correctly: we pass __rcu pointers > > > > > > to bpf_prog_array_xyz routines but don't use rcu_dereference on them > > > > > > (see bpf_prog_array_delete_safe and bpf_prog_array_copy in particular). > > > > > > Instead of sprinkling rcu_dereferences, let's just get rid of those > > > > > > __rcu annotations and move rcu handling to a higher level. > > > > > > > > > > > > It looks like all those routines are called from the rcu update > > > > > > side and we can use simple rcu_dereference_protected to get a > > > > > > reference that is valid as long as we hold a mutex (i.e. no other > > > > > > updater can change the pointer, no need for rcu read section and > > > > > > there should not be a use-after-free problem). > > > > > > > > > > > > To be fair, there is currently no issue with the existing approach > > > > > > since the calls are mutex-protected, pointer values don't change, > > > > > > __rcu annotations are ignored. But it's still nice to use proper api. > > > > > > > > > > > > The series fixes the following sparse warnings: > > > > > > > > > > Absolutely not. > > > > > please fix it properly. > > > > > Removing annotations is not a fix. > > > > I'm fixing it properly by removing the annotations and moving lifetime > > > > management to the upper layer. See commits 2-4 where I fix the users, the > > > > first patch is just the "preparation". > > > > > > > > The users are supposed to do: > > > > > > > > mutex_lock(&x); > > > > p = rcu_dereference_protected(prog_array, lockdep_is_held(&x)) > > > > // ... > > > > // call bpf_prog_array helpers while mutex guarantees that > > > > // the object referenced by p is valid (i.e. no need for bpf_prog_array > > > > // helpers to care about rcu lifetime) > > > > // ... > > > > mutex_unlock(&x); > > > > > > > > What am I missing here? > > > > > > Just to give you my perspective on why current api with __rcu annotations > > > is working, but not optimal (even if used from the rcu read section). > > > > > > Sample code: > > > > > > struct bpf_prog_array __rcu *progs = <comes from somewhere>; > > > int n; > > > > > > rcu_read_lock(); > > > n = bpf_prog_array_length(progs); > > > if (n > 0) { > > > // do something with __rcu progs > > > do_something(progs); > > > } > > > rcu_read_unlock(); > > > > > > Since progs is __rcu annotated, do_something() would need to do > > > rcu_dereference again and it might get a different value compared to > > > whatever bpf_prog_array_free got while doing its dereference. > > > > correct and I believe the code deals with it fine. > > cnt could be different between two calls. > Yes, currently there is no problem, all users of these apis > are fine because they are holding a mutex (and are hence in the rcu update > path, i.e. the pointer can't change and they have a consistent view > between the calls). > > For example, we currently have: > > int n1, n2; > mutex_lock(&x); > n1 = bpf_prog_array_length(progs); > n2 = bpf_prog_array_length(progs); > // n1 is guaranteed to be the same as n2 as long as we > // hold a mutex; single updater only > ... > mutex_unlock(&x); > > Versus: > > rcu_read_lock(); > n1 = bpf_prog_array_length(progs); > n2 = bpf_prog_array_length(progs); > // n1 and n2 can be different; rcu_read_lock is all about > // lifetime > ... > rcu_read_unlock(); > > But, as I said, we don't use those __rcu annotated bpf_prog_array > routines in the rcu read section, so we are fine. > > (I'm just showing that potentially there might be a problem if we don't move > rcu management away from bpf_prog_array routines and if someone > decides to call them under rcu_read_lock). > > > > A better way is not to deal with rcu inside those helpers and let > > > higher layers do that: > > > > > > struct bpf_prog_array __rcu *progs = <comes from somewhere>; > > > struct bpf_prog_array *p; > > > int n; > > > > > > rcu_read_lock(); > > > p = rcu_dereference(p); > > > n = bpf_prog_array_length(p); > > > if (n > 0) { > > > do_something(p); // do_something sees the same p as bpf_prog_array_length > > > } > > > rcu_read_unlock(); > > > > > > What do you think? > > > > I'm not sure that generically applicable. > > Which piece of code do you have in mind for such refactoring? > All existing callers of (take a look at patches 2-4): > > * bpf_prog_array_free > * bpf_prog_array_length > * bpf_prog_array_copy_to_user > * bpf_prog_array_delete_safe > * bpf_prog_array_copy_info > * bpf_prog_array_copy > > They are: > > * perf event bpf attach/detach/query (under bpf_event_mutex) > * cgroup attach/detach/query (management of cgroup_bpf->effective, under > cgroup_mutex) > * lirc bpf attach/detach/query (under ir_raw_handler_lock) > > Nobody uses these apis in the rcu read section, so we can remove the > annotations and use rcu_dereference_protected on the higher layer. > > Side bonus is that we can also remove __rcu from cgroup_bpf.inactive > (which is just a temp storage and not updated in the rcu fashion) and > from old_array in activate_effective_progs (which is an on-stack > array and, I guess, only has __rcu annotation to satisfy sparse). > > So nothing is changing functionality-wise, but it becomes a bit easier > to reason about by being more explicit with rcu api. disagree. mutex is necesary.