On Wed, Sep 09, 2015 at 12:56:20PM -0500, Christoph Lameter wrote: > On Wed, 9 Sep 2015, Dmitry Vyukov wrote: > > > > Guess this means that cachelines (A) may not have been be written back to > > > memory when the pointer to the object is written to another cacheline(B) > > > and that cacheline B arrives at the other processor first which has > > > outdated cachelines A in its cache? So the other processor uses the > > > contents of B to get to the pointer to A but then accesses outdated > > > information since the object contents cachelines (A) have not arrive there > > > yet? > > > > That's one example. > > Another example will be that kfree reads size from the object _before_ > > the object to the pointer is read. That sounds crazy, but it as > > actually possible on Alpha processors. > > The size is encoded in the kmem_cache structure which is not changed. How > can that be relevant? IIRC, at one point some of the Linux-kernel allocators stored some state in the object itself. What is the current state? > > Another example will be that C compiler lets a store to the object in > > kmalloc sink below the store of the pointer to the object into global. > > Well if the pointer is used nakedly to communicate between threads the > barriers need to be used but what does this have to do with slabs? It certainly is something that the users of slabs need to know. In particular, what exactly are the synchronization requirements that the slabs place on their users? Dmitry needs to know this because he is constructing a tool that automatically locates race conditions, and he needs to know who to complain to when he finds a race condition that involves slabs and their users. Here are some of my guesses, but you are the maintainer, not me. ;-) 1. Do there need to be any compiler or CPU barriers between last use and free on a single thread? Here is an example: p = kmalloc(sizeof(*p), GFP_KERNEL); if (!p) return NULL; initialize_me(p); if (do_not_really_need_it(p)) { kfree(p); return NULL; } return p; Suppose that both initialize_me() and do_not_really_need_it() are static inline functions, so that all of their loads and stores to the structure referenced by p are visible to the compiler. Is the above code correct, or is the user required to place something like barrier() before the call to kfree()? I would hope that the caller of kfree() need not invoke barrier() beforehand, but it is your decision. If the caller need not invoke barrier(), then it might (or might not) need to be supplied by the kfree() implementation. From what I understand, Dmitry's tool indicated a barrier() is needed somewhere in this code path. 2. Is it OK to do a hot handoff from kmalloc() on one thread to kfree on another? Thread 0: gp = kmalloc(sizeof(*gp), GFP_KERNEL); Thread 1: p = READ_ONCE(gp); if (gp) kfree(gp); I would be strongly tempted to just say "no" to this use case on the grounds that it is pointless, but you know your users better than do I. 3. The case that Dmitry pointed out was something like the following: Thread 0: p = kmalloc(sizeof(*p), GFP_KERNEL); if (!p) return NULL; atomic_set(&p->rc, 1); return p; Thread 1: WARN_ON(!p->rc); /* Must own ref to take another. */ atomic_inc(&p->rc); Thread 2: if (p->rc == 1 || atomic_dec_and_test(&p->rc)) kfree(p); This ends up really being the same as #1 above. > > > Ok lets say that is the case then any write attempt to A results in an > > > exclusive cacheline state and at that point the cacheline is going to > > > reflect current contents. So if kfree would write to the object then it > > > will have the current information. > > > > No, because store to the object can still be pending on another CPU. > > That would violate the cache coherency protocol as far as I can tell? It would, but there are three cases that neverthess need to be considered: (1) The pointer is in a different cacheline than is the pointed-to object, and ordering of accesses to the pointer and object matter, (2) The object covers more than one cacheline, and the ordering of accesses matters, and (3) The fields are accessed using non-atomic operations and the compiler can see into kfree(). I am most worried about #3. > > So kfree can get the object in E state in cache, but then another CPU > > will finally issue the store and overwrite the slab freelist. > > Sounds like a broken processor design to me. AFAICT the MESI protocol does > not allow this. We really need to focus on specific code sequences. I suspect that you guys are talking past each other. > > > Also what does it matter for kfree since the contents of the object are no > > > longer in use? > > > > I don't understand. First, it is not "not in use" infinitely, it can > > be in use the very next moment. Also, we don't want corruption of slab > > freelist as well. And we don't want spurious failure of debug > > allocator that checks that there no writes after free. > > Slab freelists are protected by locks. Are these locks acquired on the fastpaths? I was under the impression that they are not. That said, I do believe that these locks fully protect the case where one CPU does kfree() and some other CPU later returns that same object from kmalloc(). > A processor that can randomly defer writes to cachelines in the face of > other processors owning cachelines exclusively does not seem sane to me. > In fact its no longer exclusive. Welcome to the wonderful world of store buffers, which are present even on strongly ordered systems such as x86 and the mainframe. > > > Could you please come up with a concrete example where there is > > > brokenness that we need to consider. > > > > Well, both examples in the first email are broken according to all of > > Documentation/memory-barriers.txt, Alpha processor manual and C > > standard (assuming that object passed to kfree must be in "quiescent" > > state). > > If you want a description of an exact scenario of how it can break: > > building of freelist in kfree can be hoisted above check of > > atomic_read(&pid->count) == 1 on Alpha processors, then the freelist > > can become corrupted. > > Sounds like the atomic_read needs more barriers. We all know that this won't happen. Thanx, Paul -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>