On Tue, Mar 09, 2021 at 10:36:06AM -0800, Yonghong Song wrote: > > > On 3/9/21 10:21 AM, Roman Gushchin wrote: > > On Mon, Mar 08, 2021 at 09:44:08PM -0800, Yonghong Song wrote: > > > > > > > > > On 3/5/21 1:10 PM, Yonghong Song wrote: > > > > > > > > > > > > On 3/5/21 12:38 PM, Roman Gushchin wrote: > > > > > On Thu, Mar 04, 2021 at 08:03:33PM +0100, Jiri Olsa wrote: > > > > > > hi, > > > > > > I'm getting attached BUG/crash when running in parralel selftests, like: > > > > > > > > > > > > while :; do ./test_progs -t spinlock; done > > > > > > while :; do ./test_progs ; done > > > > > > > > > > > > it's the latest bpf-next/master, I can send the .config if needed, > > > > > > but I don't think there's anything special about it, because I saw > > > > > > the bug on other servers with different generic configs > > > > > > > > > > > > it looks like it's related to cgroup local storage, for some reason > > > > > > the storage deref returns NULL > > > > > > > > > > > > I'm bit lost in this code, so any help would be great ;-) > > > > > > > > > > Hi! > > > > > > > > > > I think the patch to blame is df1a2cb7c74b ("bpf/test_run: fix > > > > > unkillable BPF_PROG_TEST_RUN"). > > > > > > > > Thanks, Roman, I did some experiments and found the reason of NULL > > > > storage deref is because a tracing program (mostly like a kprobe) is run > > > > after bpf_cgroup_storage_set() is called but before bpf program calls > > > > bpf_get_local_storage(). Note that trace_call_bpf() macro > > > > BPF_PROG_RUN_ARRAY_CHECK does call bpf_cgroup_storage_set(). > > > > > > > > > Prior to it, we were running the test program in the > > > > > preempt_disable() && rcu_read_lock() > > > > > section: > > > > > > > > > > preempt_disable(); > > > > > rcu_read_lock(); > > > > > bpf_cgroup_storage_set(storage); > > > > > ret = BPF_PROG_RUN(prog, ctx); > > > > > rcu_read_unlock(); > > > > > preempt_enable(); > > > > > > > > > > So, a percpu variable with a cgroup local storage pointer couldn't > > > > > go away. > > > > > > > > I think even with using preempt_disable(), if the bpf program calls map > > > > lookup and there is a kprobe bpf on function htab_map_lookup_elem(), we > > > > will have the issue as BPF_PROG_RUN_ARRAY_CHECK will call > > > > bpf_cgroup_storage_set() too. I need to write a test case to confirm > > > > this though. > > > > > > > > > > > > > > After df1a2cb7c74b we can temporarily enable the preemption, so > > > > > nothing prevents > > > > > another program to call into bpf_cgroup_storage_set() on the same cpu. > > > > > I guess it's exactly what happens here. > > > > > > > > It is. I confirmed. > > > > > > Actually, the failure is not due to breaking up preempt_disable(). Even with > > > adding cond_resched(), bpf_cgroup_storage_set() still happens > > > inside the preempt region. So it is okay. What I confirmed is that > > > changing migration_{disable/enable} to preempt_{disable/enable} fixed > > > the issue. > > > > Hm, how so? If preemption is enabled, another task/bpf program can start > > executing on the same cpu and set their cgroup storage. I guess it's harder > > to reproduce or it will result in the (bpf map) memory corruption instead > > of a panic, but I don't think it's safe. > > The code has been refactored recently. The following is the code right > before refactoring to make it easy to understand: > > rcu_read_lock(); > migrate_disable(); > time_start = ktime_get_ns(); > for (i = 0; i < repeat; i++) { > bpf_cgroup_storage_set(storage); > > if (xdp) > *retval = bpf_prog_run_xdp(prog, ctx); > else > *retval = BPF_PROG_RUN(prog, ctx); > > if (signal_pending(current)) { > ret = -EINTR; > break; > } > > if (need_resched()) { > time_spent += ktime_get_ns() - time_start; > migrate_enable(); > rcu_read_unlock(); > > cond_resched(); > > rcu_read_lock(); > migrate_disable(); > time_start = ktime_get_ns(); > } > } > time_spent += ktime_get_ns() - time_start; > migrate_enable(); > rcu_read_unlock(); > > bpf_cgroup_storage_set() is called inside migration_disable/enable(). > Previously it is called inside preempt_disable/enable(), so it should be > fine. Ah, gotcha, thank you for the explanation! > > > > > > > > > So migration_{disable/enable} is the issue since any other process (and its > > > bpf program) and preempted the current process/bpf program and run. > > > > Oh, I didn't know about the preempt_{disable/enable}/migration_{disable/enable} > > change. It's definitely not safe from a cgroup local storage perspective. > > > > > Currently for each program, we will set the local storage before the > > > program run and each program may access to multiple local storage > > > maps. So we might need something similar sk_local_storage. > > > Considering possibility of deep nested migration_{disable/enable}, > > > the cgroup local storage has to be preserved in prog/map data > > > structure and not as a percpu variable as it will be very hard > > > to save and restore percpu virable content as migration can > > > happen anywhere. I don't have concrete design yet. Just throw some > > > idea here. > > > > Initially I thought about saving this pointer on stack, but then we need > > some sort of gcc/assembly magic to get this pointer from the stack outside > > of the current scope. At that time we didn't have sleepable programs, > > so the percpu approach looked simpler and more reliable. Maybe it's time > > to review it. > > Indeed this is the time. > > > > > > > > > BTW, I send a patch to prevent tracing programs to mess up > > > with cgroup local storage: > > > https://lore.kernel.org/bpf/20210309052638.400562-1-yhs@xxxxxx/T/#u > > > we now all programs access cgroup local storage should be in > > > process context and we don't need to worry about kprobe-induced > > > percpu local storage access. > > > > Thank you! My only issue is that the commit log looks like an optimization > > (like we're calling for set_cgroup_storage() for no good reason), where if > > I understand it correctly, it prevents some class of problems. > > Yes, it prevents real problems as well. The reason I did not say it is > because the patch does not really fix fundamental issue. But it does > prevent some issues. Let me reword the commit message. Thank you!