On Fri, Feb 10, 2012 at 02:51:29AM +0000, Jamie Lokier wrote: > Paul E. McKenney wrote: > > On Wed, Feb 01, 2012 at 10:33:58AM +0100, Peter Zijlstra wrote: > > > Hi all, > > > > > > So I was talking to Paul yesterday and he mentioned how the SRCU sync > > > primitive has to use extra synchronize_sched() calls in order to avoid > > > smp_rmb() calls in the srcu_read_{un,}lock() calls. > > > > > > Now memory barriers are usually explained as observable order between > > > two (or more) unrelated variables, as Documentation/memory-barriers.txt > > > does in great detail. > > > > > > What I couldn't find in there though, is what happens when both > > > variables are on the same cacheline. The "The effects of the CPU cache" > > > and "Cache coherency" sections are closest but leave me wanting on this > > > point. > > > > > > Can we get some implicit behaviour from being on the same cacheline? Or > > > can this memory access queue still totally wreck the game? > > > > I don't know of any guarantees in this area, but am checking with > > hardware architects for a couple of architectures. > > On a related note: > > - What's to stop the compiler optimising away a data dependency, > converting it to a speculative control dependency? Here's a > contrived example: > > ORIGINAL: > > int func(int *p) > { > int index = p[0], first = p[1]; > read_barrier_depends(); /* do..while(0) on most archs */ > return max(first, p[index]); > } > > OPTIMISED: > > int func(int *p) > { > int index = p[0], val = p[1]; > if (index != 1) > val = max(val, p[index]); > return val; > } > > A quick search of the GCC manual for "speculation" and > "speculative" comes up with quite a few hits. I've no idea if > they are relevant. Well, that would be one reason why I did all that work to get memory_order_consume into C++11. ;-) More seriously, you can defeat some of the speculative optimizations by using ACCESS_ONCE(): int index = ACCESS_ONCE(p[0]), first = ACCESS_ONCE(p[1]); This forces a volatile access which should make the compiler at least a bit more reluctant to apply speculation optimizations. And using rcu_dereference_index_check() in the kernel packages the ACCESS_ONCE() and the smp_read_barrier_depends(). > - If I understood correctly, IA64 has explicit special registers to > assist data-memory speculation by the compiler. These would be > the ALAT registers. I don't know if they are used in a way that > affects RCU, but they do appear in the GCC machine description, > and in the manual some kinds of "data speculative scheduling" are > enabled by default. But read_barrier_depends() is a do {} while > on IA64. As I understand it, the ALAT registers do respect dependency ordering. But you would need to talk to an IA64 hardware architect and an IA64 compiler expert to get the whole story. > - The GCC manual mentions data speculation in conjunction with > Blackfin as well. I have no idea if it's relevant, but Blackfin > does at least define read_barrier_depends() in an interesting way, > sometimes. Are there SMP blackfin systems now? There were not last I checked, and these issues matter only on SMP. > - I read that ARM can do speculative memory loads these days. It > complicates DMA. But are they implemented by speculative > preloading into the cache, or by speculatively executing load > instructions whose results are predicated on a control path > taken? If the latter, is an empty read_barrier_depends() still > ok on ARM? But ARM does guarantee dependency ordering, so whatever it does to speculate, it must validate -- the results must be as if the hardware had done no speculation. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html