Re: [PATCH] random: use correct memory barriers for crng_node_pool

Eric Biggers <ebiggers@xxxxxxxxxx> · Thu, 24 Sep 2020 19:09:08 -0700

On Thu, Sep 24, 2020 at 05:59:34PM -0700, Paul E. McKenney wrote:
> On Tue, Sep 22, 2020 at 02:55:58PM -0700, Eric Biggers wrote:
> > On Tue, Sep 22, 2020 at 01:56:28PM -0700, Paul E. McKenney wrote:
> > > > You're missing the point here.  b and c could easily be allocated by a function
> > > > alloc_b() that's in another file.
> > > 
> > > I am still missing something.
> > > 
> > > If by "allocated" you mean something like kmalloc(), the compiler doesn't
> > > know the address.  If you instead mean that there is a function that
> > > returns the address of another translation unit's static variable, then
> > > any needed ordering should preferably be built into that function's API.
> > > Either way, one would hope for some documentation of anything the caller
> > > needed to be careful of.
> > > 
> > > > > Besides which, control dependencies should be used only by LKMM experts
> > > > > at this point.  
> > > > 
> > > > What does that even mean?  Control dependencies are everywhere.
> > > 
> > > Does the following work better for you?
> > > 
> > > "... the non-local ordering properties of control dependencies should be
> > > relied on only by LKMM experts ...".
> > 
> > No.  I don't know what that means.  And I think very few people would know.
> > 
> > I just want to know if I use the one-time init pattern with a pointer to a data
> > structure foo, are the readers using foo_use() supposed to use READ_ONCE() or
> > are they supposed to use smp_load_acquire().
> > 
> > It seems the answer is that smp_load_acquire() is the only safe choice, since
> > foo_use() *might* involve a control dependency, or might in the future since
> > it's part of another kernel subsystem and its implementation could change.
> 
> First, the specific issue of one-time init.
> 
> If you are the one writing the code implementing one-time init, it is your
> choice.  It seems like you prefer smp_load_acquire().  If someone sees
> performance problems due to the resulting memory-barrier instructions,
> they have the option of submitting a patch and either forking the
> implementation or taking your implementation over from you, depending
> on how that conversation goes.

It doesn't matter what I "prefer".  The question is, how to write code that is
actually guaranteed to be correct on all supported Linux architectures, without
assuming internal implementation details of other kernel subsystems.

> Third, your first pointer-based variant works with smp_load_acquire(),
> but could instead use READ_ONCE() as long as it is reworked to something
> like this:
> 
> 	static struct foo *foo;
> 	static DEFINE_MUTEX(foo_init_mutex);
> 
> 	// The returned pointer must be handled carefully in the same
> 	// way as would a pointer returned from rcu_derefeference().
> 	// See Documentation/RCU/rcu_dereference.rst.
> 	struct foo *init_foo_if_needed(void)
> 	{
> 		int err = 0;
> 		struct foo *retp;
> 
> 		/* pairs with smp_store_release() below */
> 		retp = READ_ONCE(foo);
> 		if (retp)
> 			return retp;
> 
> 		mutex_lock(&foo_init_mutex);
> 		if (!foo) {
> 			// The compiler doesn't know the address:
> 			struct foo *p = alloc_foo();
> 
> 			if (p) /* pairs with smp_load_acquire() above */
> 				smp_store_release(&foo, p);
> 			else
> 				err = -ENOMEM;
> 		}
> 		mutex_unlock(&foo_init_mutex);
> 		return err;
> 	}
> 
> There are more than 2,000 instances of the rcu_dereference() family of
> primitives in the v5.8 kernel, so it should not be hard to find people
> who are familiar with it.  TL;DR:  Just dereference the pointer and you
> will be fine.

I don't understand why you are saying READ_ONCE() is safe here.  alloc_foo()
could very well initialize some static data as well as foo itself, and after
'if (retp)', use_foo() can be called that happens to use the static data as well
as foo itself.  That's a control dependency that would require
smp_load_acquire(); is it not?  And if foo is some object managed by another
kernel subsystem, that can easily happen without this code being specifically
aware of the implementation detail that actually causes the control dependency.

Also just because there are 2000 instances of rcu_dereference() doesn't mean
kernel developers understand the pitfalls of using it.  Especially since
real-world problems would only be seen very rarely on specific CPU architectures
and/or if some seemingly unrelated kernel code changes.

> > Let me give an example using spinlock_t, since that's used in crng_node_pool.
> > However, it could be any other data structure too; this is *just an example*.
> > And it doesn't matter if the implementation is currently different; the point is
> > that it's an *implementation*.
> > 
> > The allocation side uses spin_lock_init(), while the read side uses spin_lock().
> > Let's say that some debugging feature is enabled where spin locks use some
> > global debugging information (say, a list of all locks) that gets allocated the
> > first time a spin lock is initialized:
> > 
> > 	static struct spin_lock_debug_info *debug_info;
> > 	static DEFINE_MUTEX(debug_info_alloc_mutex);
> > 
> > 	void spin_lock_init(spinlock_t *lock)
> > 	{
> > 	#ifdef CONFIG_DEBUG_SPIN_LOCKS
> > 		mutex_lock(&debug_info_alloc_mutex);
> > 		if (!debug_info)
> > 			debug_info = alloc_debug_info();
> > 		add_lock(debug_info, lock);
> > 		mutex_unlock(&debug_info_alloc_mutex);
> > 	#endif
> > 		real_spin_lock_init(lock);
> > 	}
> > 
> > 	void spin_lock(spinlock_t *lock)
> > 	{
> > 	#ifdef CONFIG_DEBUG_SPIN_LOCKS
> > 		debug_info->...; # use the debug info
> > 	#endif
> > 		real_spin_lock(lock);
> > 	}
> > 
> > In that case, readers would have a control dependency between the condition of
> > the data struct containing the spinlock_t being non-NULL, and the dereference of
> > debug_info by spin_lock().  So anyone "receiving" a data structure containing a
> > spinlock_t would need to use smp_load_acquire(), not READ_ONCE().
> 
> Sorry, no.
> 
> The user had jolly well better make -very- sure that the call to
> spin_lock_init() is ordered before any call to spin_lock().  Running
> spin_lock() concurrently with spin_lock_init() will bring you nothing
> but sorrow, even without that debug_info control-dependency issue.

The example was one-time init of a data structure containing a spin lock.
Like the patch this thread is about:
https://lkml.kernel.org/lkml/20200916233042.51634-1-ebiggers@xxxxxxxxxx

So you're saying that smp_load_acquire() is needed, since otherwise
spin_lock_init() won't be fully ordered before spin_lock()?

> In the various one-time init examples, the required ordering would be
> correctly supplied if spin_lock_init() was invoked by init_foo() or
> alloc_foo(), depending on the example, and used only after a successful
> return from init_foo_if_needed().  None of these examples rely on control
> dependencies.

... or are you saying that READ_ONCE() provides the required full ordering
between spin_lock_init() and spin_lock()?  If so, why, and is it guaranteed
independent of the implementation of these functions?

Please explain in English.

Thanks.

- Eric