Re: Selftest failures related to kern_sync_rcu()

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Wed, 14 Apr 2021 15:13:38 -0700

On Wed, Apr 14, 2021 at 2:25 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> On Wed, Apr 14, 2021 at 09:18:09PM +0200, Toke Høiland-Jørgensen wrote:
> > "Paul E. McKenney" <paulmck@xxxxxxxxxx> writes:
> >
> > > On Wed, Apr 14, 2021 at 08:39:04PM +0200, Toke Høiland-Jørgensen wrote:
> > >> "Paul E. McKenney" <paulmck@xxxxxxxxxx> writes:
> > >>
> > >> > On Wed, Apr 14, 2021 at 10:59:23AM -0700, Alexei Starovoitov wrote:
> > >> >> On Wed, Apr 14, 2021 at 10:52 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > >> >> >
> > >> >> > > > > >                 if (num_online_cpus() > 1)
> > >> >> > > > > >                         synchronize_rcu();
> > >> >> >
> > >> >> > In CONFIG_PREEMPT_NONE=y and CONFIG_PREEMPT_VOLUNTARY=y kernels, this
> > >> >> > synchronize_rcu() will be a no-op anyway due to there only being the
> > >> >> > one CPU.  Or are these failures all happening in CONFIG_PREEMPT=y kernels,
> > >> >> > and in tests where preemption could result in the observed failures?
> > >> >> >
> > >> >> > Could you please send your .config file, or at least the relevant portions
> > >> >> > of it?
> > >> >>
> > >> >> That's my understanding as well. I assumed Toke has preempt=y.
> > >> >> Otherwise the whole thing needs to be root caused properly.
> > >> >
> > >> > Given that there is only a single CPU, I am still confused about what
> > >> > the tests are expecting the membarrier() system call to do for them.
> > >>
> > >> It's basically a proxy for waiting until the objects are freed on the
> > >> kernel side, as far as I understand...
> > >
> > > There are in-kernel objects that are freed via call_rcu(), and the idea
> > > is to wait until these objects really are freed?  Or am I still missing
> > > out on what is going on?
> >
> > Something like that? Although I'm not actually sure these are using
> > call_rcu()? One of them needs __put_task_struct() to run, and the other
> > waits for map freeing, with this comment:
> >
> >
> >       /* we need to either wait for or force synchronize_rcu(), before
> >        * checking for "still exists" condition, otherwise map could still be
> >        * resolvable by ID, causing false positives.
> >        *
> >        * Older kernels (5.8 and earlier) freed map only after two
> >        * synchronize_rcu()s, so trigger two, to be entirely sure.
> >        */
> >       CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
> >       CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
>
> OK, so the issue is that the membarrier() system call is designed to force
> ordering only within a user process, and you need it in the kernel.
>
> Give or take my being puzzled as to why the membarrier() system call
> doesn't do it for you on a CONFIG_PREEMPT_NONE=y system, this brings
> us back to the question Alexei asked me in the first place, what is the
> best way to invoke an in-kernel synchronize_rcu() from userspace?
>
> You guys gave some reasonable examples.  Here are a few others:
>
> o       Bring a CPU online, then force it offline, or vice versa.
>         But in this case, sys_membarrier() would do what you need
>         given more than one CPU.
>
> o       Use the membarrier() system call, but require that the tests
>         run on systems with at least two CPUs.
>
> o       Create a kernel module whose init function does a
>         synchronize_rcu() and then returns failure.  This will
>         avoid the overhead of removing that kernel module.
>
> o       Create a sysfs or debugfs interface that does a
>         synchronize_rcu().
>
> But I am still concerned that you are needing more than synchronize_rcu()
> can do.  Otherwise, the membarrier() system call would work just fine
> on a single CPU on your CONFIG_PREEMPT_VOLUNTARY=y kernel.

Selftests know internals of kernel implementation and wait for some
objects to be freed with call_rcu(). So I think at this point the best
way is just to go back to map-in-map or socket local storage.
Map-in-map will probably work on older kernels, so I'd stick with that
(plus all the code is there in the referenced commit). The performance
and number of syscalls performed doesn't matter, really.

>
>                                                         Thanx, Paul