Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes: > On Wed, Apr 14, 2021 at 2:25 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: >> >> On Wed, Apr 14, 2021 at 09:18:09PM +0200, Toke Høiland-Jørgensen wrote: >> > "Paul E. McKenney" <paulmck@xxxxxxxxxx> writes: >> > >> > > On Wed, Apr 14, 2021 at 08:39:04PM +0200, Toke Høiland-Jørgensen wrote: >> > >> "Paul E. McKenney" <paulmck@xxxxxxxxxx> writes: >> > >> >> > >> > On Wed, Apr 14, 2021 at 10:59:23AM -0700, Alexei Starovoitov wrote: >> > >> >> On Wed, Apr 14, 2021 at 10:52 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: >> > >> >> > >> > >> >> > > > > > if (num_online_cpus() > 1) >> > >> >> > > > > > synchronize_rcu(); >> > >> >> > >> > >> >> > In CONFIG_PREEMPT_NONE=y and CONFIG_PREEMPT_VOLUNTARY=y kernels, this >> > >> >> > synchronize_rcu() will be a no-op anyway due to there only being the >> > >> >> > one CPU. Or are these failures all happening in CONFIG_PREEMPT=y kernels, >> > >> >> > and in tests where preemption could result in the observed failures? >> > >> >> > >> > >> >> > Could you please send your .config file, or at least the relevant portions >> > >> >> > of it? >> > >> >> >> > >> >> That's my understanding as well. I assumed Toke has preempt=y. >> > >> >> Otherwise the whole thing needs to be root caused properly. >> > >> > >> > >> > Given that there is only a single CPU, I am still confused about what >> > >> > the tests are expecting the membarrier() system call to do for them. >> > >> >> > >> It's basically a proxy for waiting until the objects are freed on the >> > >> kernel side, as far as I understand... >> > > >> > > There are in-kernel objects that are freed via call_rcu(), and the idea >> > > is to wait until these objects really are freed? Or am I still missing >> > > out on what is going on? >> > >> > Something like that? Although I'm not actually sure these are using >> > call_rcu()? One of them needs __put_task_struct() to run, and the other >> > waits for map freeing, with this comment: >> > >> > >> > /* we need to either wait for or force synchronize_rcu(), before >> > * checking for "still exists" condition, otherwise map could still be >> > * resolvable by ID, causing false positives. >> > * >> > * Older kernels (5.8 and earlier) freed map only after two >> > * synchronize_rcu()s, so trigger two, to be entirely sure. >> > */ >> > CHECK(kern_sync_rcu(), "sync_rcu", "failed\n"); >> > CHECK(kern_sync_rcu(), "sync_rcu", "failed\n"); >> >> OK, so the issue is that the membarrier() system call is designed to force >> ordering only within a user process, and you need it in the kernel. >> >> Give or take my being puzzled as to why the membarrier() system call >> doesn't do it for you on a CONFIG_PREEMPT_NONE=y system, this brings >> us back to the question Alexei asked me in the first place, what is the >> best way to invoke an in-kernel synchronize_rcu() from userspace? >> >> You guys gave some reasonable examples. Here are a few others: >> >> o Bring a CPU online, then force it offline, or vice versa. >> But in this case, sys_membarrier() would do what you need >> given more than one CPU. >> >> o Use the membarrier() system call, but require that the tests >> run on systems with at least two CPUs. >> >> o Create a kernel module whose init function does a >> synchronize_rcu() and then returns failure. This will >> avoid the overhead of removing that kernel module. >> >> o Create a sysfs or debugfs interface that does a >> synchronize_rcu(). >> >> But I am still concerned that you are needing more than synchronize_rcu() >> can do. Otherwise, the membarrier() system call would work just fine >> on a single CPU on your CONFIG_PREEMPT_VOLUNTARY=y kernel. > > Selftests know internals of kernel implementation and wait for some > objects to be freed with call_rcu(). So I think at this point the best > way is just to go back to map-in-map or socket local storage. > Map-in-map will probably work on older kernels, so I'd stick with that > (plus all the code is there in the referenced commit). The performance > and number of syscalls performed doesn't matter, really. Just tried that (with the patch below, pulled from the commit you referred), and that doesn't help. Still get this with a single CPU: test_lookup_update:FAIL:map1_leak inner_map1 leaked! #15/1 lookup_update:FAIL #15 btf_map_in_map:FAIL It's fine with 2 CPUs. And the other failures (in the task_local_storage test) seem to have gone away entirely after I just pulled the newest bpf-next... -Toke diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c index 6396932b97e2..4c26d84a64dc 100644 --- a/tools/testing/selftests/bpf/test_progs.c +++ b/tools/testing/selftests/bpf/test_progs.c @@ -376,7 +376,25 @@ static int delete_module(const char *name, int flags) */ int kern_sync_rcu(void) { - return syscall(__NR_membarrier, MEMBARRIER_CMD_SHARED, 0, 0); + int inner_map_fd, outer_map_fd, err, zero = 0; + + inner_map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, 4, 4, 1, 0); + if (!ASSERT_LT(0, inner_map_fd, "inner_map_create")) + return -1; + + outer_map_fd = bpf_create_map_in_map(BPF_MAP_TYPE_ARRAY_OF_MAPS, NULL, + sizeof(int), inner_map_fd, 1, 0); + if (!ASSERT_LT(0, outer_map_fd, "outer_map_create")) { + close(inner_map_fd); + return -1; + } + + err = bpf_map_update_elem(outer_map_fd, &zero, &inner_map_fd, 0); + if (err) + err = -errno; + ASSERT_OK(err, "outer_map_update"); + close(inner_map_fd); + close(outer_map_fd); } static void unload_bpf_testmod(void)