Re: Selftest failures related to kern_sync_rcu()

Toke Høiland-Jørgensen <toke@xxxxxxxxxx> · Wed, 14 Apr 2021 21:18:09 +0200

"Paul E. McKenney" <paulmck@xxxxxxxxxx> writes:

> On Wed, Apr 14, 2021 at 08:39:04PM +0200, Toke Høiland-Jørgensen wrote:
>> "Paul E. McKenney" <paulmck@xxxxxxxxxx> writes:
>> 
>> > On Wed, Apr 14, 2021 at 10:59:23AM -0700, Alexei Starovoitov wrote:
>> >> On Wed, Apr 14, 2021 at 10:52 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>> >> >
>> >> > > > > >                 if (num_online_cpus() > 1)
>> >> > > > > >                         synchronize_rcu();
>> >> >
>> >> > In CONFIG_PREEMPT_NONE=y and CONFIG_PREEMPT_VOLUNTARY=y kernels, this
>> >> > synchronize_rcu() will be a no-op anyway due to there only being the
>> >> > one CPU.  Or are these failures all happening in CONFIG_PREEMPT=y kernels,
>> >> > and in tests where preemption could result in the observed failures?
>> >> >
>> >> > Could you please send your .config file, or at least the relevant portions
>> >> > of it?
>> >> 
>> >> That's my understanding as well. I assumed Toke has preempt=y.
>> >> Otherwise the whole thing needs to be root caused properly.
>> >
>> > Given that there is only a single CPU, I am still confused about what
>> > the tests are expecting the membarrier() system call to do for them.
>> 
>> It's basically a proxy for waiting until the objects are freed on the
>> kernel side, as far as I understand...
>
> There are in-kernel objects that are freed via call_rcu(), and the idea
> is to wait until these objects really are freed?  Or am I still missing
> out on what is going on?

Something like that? Although I'm not actually sure these are using
call_rcu()? One of them needs __put_task_struct() to run, and the other
waits for map freeing, with this comment:

	/* we need to either wait for or force synchronize_rcu(), before
	 * checking for "still exists" condition, otherwise map could still be
	 * resolvable by ID, causing false positives.
	 *
	 * Older kernels (5.8 and earlier) freed map only after two
	 * synchronize_rcu()s, so trigger two, to be entirely sure.
	 */
	CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
	CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");

-Toke