Re: Selftest failures related to kern_sync_rcu()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 13, 2021 at 1:50 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes:
>
> > On Thu, Apr 8, 2021 at 12:34 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
> >>
> >> Hi Andrii
> >>
> >> I'm getting some selftest failures that all seem to have something to do
> >> with kern_sync_rcu() not being enough to trigger the kernel events that
> >> the selftest expects:
> >>
> >> $ ./test_progs | grep FAIL
> >> test_lookup_update:FAIL:map1_leak inner_map1 leaked!
> >> #15/1 lookup_update:FAIL
> >> #15 btf_map_in_map:FAIL
> >> test_exit_creds:FAIL:null_ptr_count unexpected null_ptr_count: actual 0 == expected 0
> >> #123/2 exit_creds:FAIL
> >> #123 task_local_storage:FAIL
> >> test_exit_creds:FAIL:null_ptr_count unexpected null_ptr_count: actual 0 == expected 0
> >> #123/2 exit_creds:FAIL
> >> #123 task_local_storage:FAIL
> >>
> >> They are all fixed by adding a sleep(1) after the call(s) to
> >> kern_sync_rcu(), so I'm guessing it's some kind of
> >> timing/synchronisation problem. Is there a particular kernel config
> >> that's needed for the membarrier syscall trick to work? I've tried with
> >> various settings of PREEMPT and that doesn't really seem to make any
> >> difference...
> >>
> >
> > If you check kern_sync_rcu(), it relies on membarrier() syscall
> > (passing cmd = MEMBARRIER_CMD_SHARED == MEMBARRIER_CMD_GLOBAL).
> > Now, looking at kernel sources:
> >   - CONFIG_MEMBARRIER should be enabled for that syscall;
> >   - it has some extra conditions:
> >
> >            case MEMBARRIER_CMD_GLOBAL:
> >                 /* MEMBARRIER_CMD_GLOBAL is not compatible with nohz_full. */
> >                 if (tick_nohz_full_enabled())
> >                         return -EINVAL;
> >                 if (num_online_cpus() > 1)
> >                         synchronize_rcu();
> >                 return 0;
> >
> > Could it be that one of those conditions is not satisfied?
>
> Aha, bingo! Found the membarrier syscall stuff, but for some reason
> didn't think to actually read the code of it; and I was running this in
> a VM with a single CPU, adding another fixed this. Thanks! :)
>
> Do you think we could detect this in the tests? I suppose the
> tick_nohz_full_enabled() check should already result in a visible
> failure since that makes the syscall fail; but the CPU thing is silent,
> so it would be nice with a hint. Could kern_sync_rcu() check the CPU
> count and print a warning or fail if it is 1? Or maybe just straight up
> fall back to sleep()'ing?

If membarrier() is unreliable, I guess we can just go back to the
previous way of triggering synchronize_rcu() (create and update
map-in-map element)? See 635599bace25 ("selftests/bpf: Sync RCU before
unloading bpf_testmod") that removed that in favor of membarrier()
syscall.

>
> -Toke
>




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux