Re: Selftest failures related to kern_sync_rcu()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 13, 2021 at 11:58 PM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Tue, Apr 13, 2021 at 1:50 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
> >
> > Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes:
> >
> > > On Thu, Apr 8, 2021 at 12:34 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
> > >>
> > >> Hi Andrii
> > >>
> > >> I'm getting some selftest failures that all seem to have something to do
> > >> with kern_sync_rcu() not being enough to trigger the kernel events that
> > >> the selftest expects:
> > >>
> > >> $ ./test_progs | grep FAIL
> > >> test_lookup_update:FAIL:map1_leak inner_map1 leaked!
> > >> #15/1 lookup_update:FAIL
> > >> #15 btf_map_in_map:FAIL
> > >> test_exit_creds:FAIL:null_ptr_count unexpected null_ptr_count: actual 0 == expected 0
> > >> #123/2 exit_creds:FAIL
> > >> #123 task_local_storage:FAIL
> > >> test_exit_creds:FAIL:null_ptr_count unexpected null_ptr_count: actual 0 == expected 0
> > >> #123/2 exit_creds:FAIL
> > >> #123 task_local_storage:FAIL
> > >>
> > >> They are all fixed by adding a sleep(1) after the call(s) to
> > >> kern_sync_rcu(), so I'm guessing it's some kind of
> > >> timing/synchronisation problem. Is there a particular kernel config
> > >> that's needed for the membarrier syscall trick to work? I've tried with
> > >> various settings of PREEMPT and that doesn't really seem to make any
> > >> difference...
> > >>
> > >
> > > If you check kern_sync_rcu(), it relies on membarrier() syscall
> > > (passing cmd = MEMBARRIER_CMD_SHARED == MEMBARRIER_CMD_GLOBAL).
> > > Now, looking at kernel sources:
> > >   - CONFIG_MEMBARRIER should be enabled for that syscall;
> > >   - it has some extra conditions:
> > >
> > >            case MEMBARRIER_CMD_GLOBAL:
> > >                 /* MEMBARRIER_CMD_GLOBAL is not compatible with nohz_full. */
> > >                 if (tick_nohz_full_enabled())
> > >                         return -EINVAL;
> > >                 if (num_online_cpus() > 1)
> > >                         synchronize_rcu();
> > >                 return 0;
> > >
> > > Could it be that one of those conditions is not satisfied?
> >
> > Aha, bingo! Found the membarrier syscall stuff, but for some reason
> > didn't think to actually read the code of it; and I was running this in
> > a VM with a single CPU, adding another fixed this. Thanks! :)
> >
> > Do you think we could detect this in the tests? I suppose the
> > tick_nohz_full_enabled() check should already result in a visible
> > failure since that makes the syscall fail; but the CPU thing is silent,
> > so it would be nice with a hint. Could kern_sync_rcu() check the CPU
> > count and print a warning or fail if it is 1? Or maybe just straight up
> > fall back to sleep()'ing?
>
> If membarrier() is unreliable, I guess we can just go back to the
> previous way of triggering synchronize_rcu() (create and update
> map-in-map element)? See 635599bace25 ("selftests/bpf: Sync RCU before
> unloading bpf_testmod") that removed that in favor of membarrier()
> syscall.

maybe create+free socket_local_storage map ? Few syscalls less.
I guess map_in_map is fine too.

Paul,
What do you suggest to trigger synchronize_rcu() from user space?




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux