On Thu, Feb 06, 2014 at 06:55:01PM +0000, Ramana Radhakrishnan wrote: > On 02/06/14 18:25, David Howells wrote: > > > > Is it worth considering a move towards using C11 atomics and barriers and > > compiler intrinsics inside the kernel? The compiler _ought_ to be able to do > > these. > > > It sounds interesting to me, if we can make it work properly and > reliably. + gcc@xxxxxxxxxxx for others in the GCC community to chip in. Given my (albeit limited) experience playing with the C11 spec and GCC, I really think this is a bad idea for the kernel. It seems that nobody really agrees on exactly how the C11 atomics map to real architectural instructions on anything but the trivial architectures. For example, should the following code fire the assert? extern atomic<int> foo, bar, baz; void thread1(void) { foo.store(42, memory_order_relaxed); bar.fetch_add(1, memory_order_seq_cst); baz.store(42, memory_order_relaxed); } void thread2(void) { while (baz.load(memory_order_seq_cst) != 42) { /* do nothing */ } assert(foo.load(memory_order_seq_cst) == 42); } To answer that question, you need to go and look at the definitions of synchronises-with, happens-before, dependency_ordered_before and a whole pile of vaguely written waffle to realise that you don't know. Certainly, the code that arm64 GCC currently spits out would allow the assertion to fire on some microarchitectures. There are also so many ways to blow your head off it's untrue. For example, cmpxchg takes a separate memory model parameter for failure and success, but then there are restrictions on the sets you can use for each. It's not hard to find well-known memory-ordering experts shouting "Just use memory_model_seq_cst for everything, it's too hard otherwise". Then there's the fun of load-consume vs load-acquire (arm64 GCC completely ignores consume atm and optimises all of the data dependencies away) as well as the definition of "data races", which seem to be used as an excuse to miscompile a program at the earliest opportunity. Trying to introduce system concepts (writes to devices, interrupts, non-coherent agents) into this mess is going to be an uphill battle IMHO. I'd just rather stick to the semantics we have and the asm volatile barriers. That's not to say I don't there's no room for improvement in what we have in the kernel. Certainly, I'd welcome allowing more relaxed operations on architectures that support them, but it needs to be something that at least the different architecture maintainers can understand how to implement efficiently behind an uncomplicated interface. I don't think that interface is C11. Just my thoughts on the matter... Will -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html