Re: Is it OK to pass non-acquired objects to kfree?

"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> · Wed, 9 Sep 2015 11:44:15 -0700

On Wed, Sep 09, 2015 at 12:56:20PM -0500, Christoph Lameter wrote:
> On Wed, 9 Sep 2015, Dmitry Vyukov wrote:
> 
> > > Guess this means that cachelines (A) may not have been be written back to
> > > memory when the pointer to the object is written to another cacheline(B)
> > > and that cacheline B arrives at the other processor first which has
> > > outdated cachelines A in its cache? So the other processor uses the
> > > contents of B to get to the pointer to A but then accesses outdated
> > > information since the object contents cachelines (A) have not arrive there
> > > yet?
> >
> > That's one example.
> > Another example will be that kfree reads size from the object _before_
> > the object to the pointer is read. That sounds crazy, but it as
> > actually possible on Alpha processors.
> 
> The size is encoded in the kmem_cache structure which is not changed. How
> can that be relevant?

IIRC, at one point some of the Linux-kernel allocators stored some
state in the object itself.  What is the current state?

> > Another example will be that C compiler lets a store to the object in
> > kmalloc sink below the store of the pointer to the object into global.
> 
> Well if the pointer is used nakedly to communicate between threads the
> barriers need to be used but what does this have to do with slabs?

It certainly is something that the users of slabs need to know.
In particular, what exactly are the synchronization requirements that
the slabs place on their users?  Dmitry needs to know this because he
is constructing a tool that automatically locates race conditions, and
he needs to know who to complain to when he finds a race condition that
involves slabs and their users.

Here are some of my guesses, but you are the maintainer, not me.  ;-)

1.	Do there need to be any compiler or CPU barriers between
	last use and free on a single thread?  Here is an example:

	p = kmalloc(sizeof(*p), GFP_KERNEL);
	if (!p)
		return NULL;
	initialize_me(p);
	if (do_not_really_need_it(p)) {
		kfree(p);
		return NULL;
	}
	return p;

	Suppose that both initialize_me() and do_not_really_need_it()
	are static inline functions, so that all of their loads and
	stores to the structure referenced by p are visible to the
	compiler.  Is the above code correct, or is the user required
	to place something like barrier() before the call to kfree()?

	I would hope that the caller of kfree() need not invoke barrier()
	beforehand, but it is your decision.  If the caller need not
	invoke barrier(), then it might (or might not) need to be supplied
	by the kfree() implementation.	From what I understand, Dmitry's
	tool indicated a barrier() is needed somewhere in this code path.

2.	Is it OK to do a hot handoff from kmalloc() on one thread to
	kfree on another?

	Thread 0:

		gp = kmalloc(sizeof(*gp), GFP_KERNEL);

	Thread 1:

		p = READ_ONCE(gp);
		if (gp)
			kfree(gp);

	I would be strongly tempted to just say "no" to this use case
	on the grounds that it is pointless, but you know your users
	better than do I.

3.	The case that Dmitry pointed out was something like the following:

	Thread 0:

		p = kmalloc(sizeof(*p), GFP_KERNEL);
		if (!p)
			return NULL;
		atomic_set(&p->rc, 1);
		return p;

	Thread 1:

		WARN_ON(!p->rc);  /* Must own ref to take another. */
		atomic_inc(&p->rc);

	Thread 2:

		if (p->rc == 1 ||
		    atomic_dec_and_test(&p->rc))
		    	kfree(p);

	This ends up really being the same as #1 above.

> > > Ok lets say that is the case then any write attempt to A results in an
> > > exclusive cacheline state and at that point the cacheline is going to
> > > reflect current contents. So if kfree would write to the object then it
> > > will have the current information.
> >
> > No, because store to the object can still be pending on another CPU.
> 
> That would violate the cache coherency protocol as far as I can tell?

It would, but there are three cases that neverthess need to be considered:
(1) The pointer is in a different cacheline than is the pointed-to object,
and ordering of accesses to the pointer and object matter, (2) The object
covers more than one cacheline, and the ordering of accesses matters,
and (3) The fields are accessed using non-atomic operations and the
compiler can see into kfree().  I am most worried about #3.

> > So kfree can get the object in E state in cache, but then another CPU
> > will finally issue the store and overwrite the slab freelist.
> 
> Sounds like a broken processor design to me. AFAICT the MESI protocol does
> not allow this.

We really need to focus on specific code sequences.  I suspect that you
guys are talking past each other.

> > > Also what does it matter for kfree since the contents of the object are no
> > > longer in use?
> >
> > I don't understand. First, it is not "not in use" infinitely, it can
> > be in use the very next moment. Also, we don't want corruption of slab
> > freelist as well. And we don't want spurious failure of debug
> > allocator that checks that there no writes after free.
> 
> Slab freelists are protected by locks.

Are these locks acquired on the fastpaths?  I was under the impression
that they are not.  That said, I do believe that these locks fully
protect the case where one CPU does kfree() and some other CPU later
returns that same object from kmalloc().

> A processor that can randomly defer writes to cachelines in the face of
> other processors owning cachelines exclusively does not seem sane to me.
> In fact its no longer exclusive.

Welcome to the wonderful world of store buffers, which are present even
on strongly ordered systems such as x86 and the mainframe.

> > > Could you please come up with a concrete example where there is
> > > brokenness that we need to consider.
> >
> > Well, both examples in the first email are broken according to all of
> > Documentation/memory-barriers.txt, Alpha processor manual and C
> > standard (assuming that object passed to kfree must be in "quiescent"
> > state).
> > If you want a description of an exact scenario of how it can break:
> > building of freelist in kfree can be hoisted above check of
> > atomic_read(&pid->count) == 1 on Alpha processors, then the freelist
> > can become corrupted.
> 
> Sounds like the atomic_read needs more barriers.

We all know that this won't happen.

							Thanx, Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>