On Fri, Jul 21, 2017 at 07:52:03AM +0900, Akira Yokosawa wrote: > On 2017/07/20 14:42:34 -0700, Paul E. McKenney wrote: > > On Fri, Jul 21, 2017 at 06:12:56AM +0900, Akira Yokosawa wrote: > >> On 2017/07/20 09:11:52AM -0700, Paul E. McKenney wrote: > >>> On Thu, Jul 20, 2017 at 09:55:31PM +0900, Akira Yokosawa wrote: > >>>> On 2017/07/20 14:47, Paul E. McKenney wrote: > >>>>> On Thu, Jul 20, 2017 at 09:31:41AM +0800, Boqun Feng wrote: > >>>>>> On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote: > >>>>>>> On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote: > >>>>>>>> On 2017/07/20 2:43, Paul E. McKenney wrote: > >>>>>>>>> On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote: > >>>>>>>>>> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001 > >>>>>>>>>> From: Akira Yokosawa <akiyks@xxxxxxxxx> > >>>>>>>>>> Date: Mon, 17 Jul 2017 16:25:33 +0900 > >>>>>>>>>> Subject: [PATCH] documentation: Fix two-CPU control-dependency example > >>>>>>>>>> > >>>>>>>>>> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering > >>>>>>>>>> no-transitivity example"), the operator in "if" statement of > >>>>>>>>>> the two-CPU example was modified from ">=" to ">". > >>>>>>>>>> Now the example misses the point because there is no party > >>>>>>>>>> who will modify "x" nor "y". So each CPU performs only the > >>>>>>>>>> READ_ONCE(). > >>>>>>>>>> > >>>>>>>>>> The point of this example is to use control dependency for ordering, > >>>>>>>>>> and the WRITE_ONCE() should always be executed. > >>>>>>>>>> > >>>>>>>>>> So it was correct prior to the above mentioned commit. Partial > >>>>>>>>>> revert of the commit (with context adjustments regarding other > >>>>>>>>>> changes thereafter) restores the point. > >>>>>>>>>> > >>>>>>>>>> Note that the three-CPU example demonstrating the lack of > >>>>>>>>>> transitivity stands regardless of this partial revert. > >>>>>>>>>> > >>>>>>>>>> Signed-off-by: Akira Yokosawa <akiyks@xxxxxxxxx> > >>>>>>>>> > >>>>>>>>> Hello, Akira, > >>>>>>>>> > >>>>>>>>> You are quite right that many compilers would generate straightforward > >>>>>>>>> code for the code fragment below, and in that case, the assertion could > >>>>>>>>> never trigger due to either TSO or control dependencies, depending on > >>>>>>>>> the architecture this was running on. > >>>>>>>>> > >>>>>>>>> However, if the compiler was too smart for our good, it could figure > >>>>>>>>> out that "x" and "y" only take on the values zero and one, so that > >>>>>>>>> the "if" would always be taken. At that point, the compiler could > >>>>>>>>> simply ignore the "if" with the result shown below. > >>>>>>>>> > >>>>>>>>>> --- > >>>>>>>>>> Documentation/memory-barriers.txt | 2 +- > >>>>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>>>>>>>> > >>>>>>>>>> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt > >>>>>>>>>> index c4ddfcd..c1ebe99 100644 > >>>>>>>>>> --- a/Documentation/memory-barriers.txt > >>>>>>>>>> +++ b/Documentation/memory-barriers.txt > >>>>>>>>>> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the initial values of > >>>>>>>>>> CPU 0 CPU 1 > >>>>>>>>>> ======================= ======================= > >>>>>>>>>> r1 = READ_ONCE(x); r2 = READ_ONCE(y); > >>>>>>>>>> - if (r1 > 0) if (r2 > 0) > >>>>>>>>>> + if (r1 >= 0) if (r2 >= 0) > >>>>>>>>>> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1); > >>>>>>>>>> > >>>>>>>>>> assert(!(r1 == 1 && r2 == 1)); > >>>>>>>>> > >>>>>>>>> Original program: > >>>>>>>>> > >>>>>>>>> CPU 0 CPU 1 > >>>>>>>>> ======================= ======================= > >>>>>>>>> r1 = READ_ONCE(x); r2 = READ_ONCE(y); > >>>>>>>>> if (r1 >= 0) if (r2 >= 0) > >>>>>>>>> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1); > >>>>>>>>> > >>>>>>>>> assert(!(r1 == 1 && r2 == 1)); > >>>>>>>>> > >>>>>>>>> Enthusiastically optimized program: > >>>>>>>>> > >>>>>>>>> CPU 0 CPU 1 > >>>>>>>>> ======================= ======================= > >>>>>>>>> r1 = READ_ONCE(x); r2 = READ_ONCE(y); > >>>>>>>>> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1); > >>>>>>>>> > >>>>>>>>> assert(!(r1 == 1 && r2 == 1)); > >>>>>>>>> > >>>>>>>>> This optimized version could trigger the assertion. > >>>>>>>>> > >>>>>>>>> So we do need to stick with the ">" comparison. > >>>>>>>> > >>>>>>>> Well but, > >>>>>>>> > >>>>>>>> Current example: > >>>>>>>> > >>>>>>>> CPU 0 CPU 1 > >>>>>>>> ======================= ======================= > >>>>>>>> r1 = READ_ONCE(x); r2 = READ_ONCE(y); > >>>>>>>> if (r1 > 0) if (r2 > 0) > >>>>>>>> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1); > >>>>>>>> > >>>>>>>> assert(!(r1 == 1 && r2 == 1)); > >>>>>>>> > >>>>>>>> Such a clever compiler might be able to prove that "x" and "y" > >>>>>>>> are never modified and end up in the following: > >>>>>>>> > >>>>>> > >>>>>> Hi Akira, > >>>>>> > >>>>>> I wouldn't call that compiler a clever one, it's a broken one ;-) > >>>>>> > >>>>>> So here is the thing: READ_ONCE() is a *volatile* load, which means the > >>>>>> compiler has to generate code that actually does a load, so the values > >>>>>> of r1 and r2 depend on the loads, therefore, a sane compiler will not > >>>>>> optimize the if()s out because the volatile semantics of READ_ONCE(). > >>>>>> Otherwise, I think we have a lot more to worry about than this case. > >>>>>> > >>>>>>>> CPU 0 CPU 1 > >>>>>>>> ======================= ======================= > >>>>>>>> r1 = READ_ONCE(x); r2 = READ_ONCE(y); > >>>>>>>> > >>>>>>>> assert(!(r1 == 1 && r2 == 1)); > >>>>>>>> > >>>>>>>> This means it is impossible to describe this example in C, > >>>>>>>> doesn't it? > >>>>>>> > >>>>>>> That is a matter of some debate, but it has gotten increasingly more > >>>>>>> difficult to get C to do one's bidding over the decades. > >>>>>>> > >>>>>>>> What am I missing here? > >>>>>>> > >>>>>>> The compiler has to work harder in your example case, so it is probably > >>>>>>> just a matter of time. In the original with ">=", all the compiler needed > >>>>>>> to do was look at all the assignments and the initial value. In the > >>>>>>> original, it also had to do reasoning based on control dependencies > >>>>>>> (which are not yet part of the C standard). > >>>>>>> > >>>>>>> But yes, the degree to which a compiler can optimize atomics is a matter > >>>>>>> of some debate. Here is a proposal to let the developer choose: > >>>>>>> > >>>>>> > >>>>>> Hi Paul, > >>>>>> > >>>>>> I know the compiler could optimize atomics in very interesting ways, but > >>>>>> this case is about volatile, so I guess our case is still fine? ;-) > >>>>> > >>>>> Hello, Boqun, > >>>>> > >>>>> When I asked that question, the answer I got was "the compiler must > >>>>> emit the load instruction, but is under no obligation to actually use the > >>>>> value loaded". > >>>> > >>>> So, it sounds like the following description found in memory-barriers.txt > >>>> just above the example of lack of transitivity: > >>>> > >>>>> However, stores are not speculated. This means that ordering -is- provided > >>>>> for load-store control dependencies, as in the following example: > >>>>> > >>>>> q = READ_ONCE(a); > >>>>> if (q) { > >>>>> WRITE_ONCE(b, 1); > >>>>> } > >>>>> > >>>>> Control dependencies pair normally with other types of barriers. > >>>>> That said, please note that neither READ_ONCE() nor WRITE_ONCE() > >>>>> are optional! Without the READ_ONCE(), the compiler might combine the > >>>>> load from 'a' with other loads from 'a'. Without the WRITE_ONCE(), > >>>>> the compiler might combine the store to 'b' with other stores to 'b'. > >>>>> Either can result in highly counterintuitive effects on ordering. > >>>>> > >>>>> Worse yet, if the compiler is able to prove (say) that the value of > >>>>> variable 'a' is always non-zero, it would be well within its rights > >>>>> to optimize the original example by eliminating the "if" statement > >>>>> as follows: > >>>>> > >>>>> q = a; > >>>>> b = 1; /* BUG: Compiler and CPU can both reorder!!! */ > >>>>> > >>>>> So don't leave out the READ_ONCE(). > >>>> > >>>> does not stand if the answer needs to be taken seriously. > >>>> The READ_ONCE() is not good enough to prevent the "if" statement > >>>> from being eliminated. > >>>> > >>>> Hmm... If we do need to care about such an extreme optimization, > >>>> we should not rely on any control dependency in C source code > >>>> at least until the compiler gets educated about control dependency. > >>>> > >>>> Is there some reasonable compromise? > >>> > >>> Right now, what we have are the volatile loads such as from READ_ONCE() > >>> and WRITE_ONCE(). My defense of them has been that they need to properly > >>> handle MMIO. But not all compiler writers agree with me, so we should > >>> program at least a bit defensively. Though we should probably also > >>> be writing tests to verify that the compiler is doing the right thing, > >>> and those litmus tests should probably include your ">=" litmus test. > >>> But we should not encourage developers to rely on it quite yet. > >>> > >>> My current position is that compiler writers are currently unlikely to > >>> make their compilers do deep reasoning around the memory model, and that > >>> they are currently unlikely to do cross-translation-unit assigned-value > >>> analysis (though LTO might be changing that). But given a static atomic > >>> that is visible only within a translation unit, I would expect them > >>> to do value-range analysis. Hence my reluctance to present the ">=" > >>> variant as a pattern to follow. > >> > >> So if we declare "x" and "y" are not static (not file local), > >> can the ">=" variant be exempted from the possible optimization? > >> > >> For example: > >> > >> extern int x, y; > >> > >> CPU 0 CPU 1 > >> ======================= ======================= > >> r1 = READ_ONCE(x); r2 = READ_ONCE(y); > >> if (r1 >= 0) if (r2 >= 0) > >> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1); > >> > >> assert(!(r1 == 1 && r2 == 1)); > >> > >> This looks safe to me. > > > > For the compilers I know about at the present time, yes. > > So if I respin the patch with the extern, would you still feel reluctant? Yes, because I am not seeing how this change helps. What is this telling the reader that the original did not, and how does it help the reader generate good concurrent code? Thanx, Paul > Regards, Akira > > > > > The underlying problem is that the standard says almost nothing about what > > volatile does. I usually argue that it was intended to be used for MMIO, > > so any optimization that would prevent a device driver from working must > > be prohibited on volatiles. A lot of people really don't like volatile, > > and would like to eliminate it, and these people also aren't very happy > > about any attempt to better define volatile. > > > > Keeps things entertaining. ;-) > > > > Thanx, Paul > > > >> Thoughts? > >> > >> Thanks, Akira > >> > >>> > >>> Thanx, Paul > >>> > >>> > >>>> Thanks, Akira > >>>> > >>>>> > >>>>> I don't happen to like that answer, by the way. ;-) > >>>>> > >>>>> Thanx, Paul > >>>>> > >>>>>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html > >>>>>>> > >>>>>> > >>>>>> Great material to wake up mind in the morning! Thanks. > >>>>>> > >>>>>> Regards, > >>>>>> Boqun > >>>>>> > >>>>>>> What are your thoughts on this? > >>>>>>> > >>>>>>> Thanx, Paul > >>>>>>> > >>>>>>>> Thanks, Akira > >>>>>>>> > >>>>>>>>> That said, I very much welcome critical reviews of memory-barriers.txt, > >>>>>>>>> so please do feel free to continue doing that! > >>>>>>>>> > >>>>>>>>> Thanx, Paul > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>> > >>> > >>> > >> > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html