Re: [PATCH] tools/memory-model: Weaken ctrl dependency definition in explanation.txt

Paul Heidekrüger <Paul.Heidekrueger@xxxxxxxxx> · Fri, 2 Sep 2022 10:40:34 +0200

On 31. Aug 2022, at 19:38, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:

> On Wed, Aug 31, 2022 at 06:42:05PM +0200, Paul Heidekrüger wrote:
>> On 31. Aug 2022, at 03:57, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
>> 
>>> On Tue, Aug 30, 2022 at 05:12:33PM -0400, Joel Fernandes wrote:
>>>> On 8/30/2022 5:08 PM, Joel Fernandes wrote:
>>>>> On 8/30/2022 4:44 PM, Paul Heidekrüger wrote:
>>>>>> The current informal control dependency definition in explanation.txt is
>>>>>> too broad and, as dicsussed, needs to be updated.
>>>>>> 
>>>>>> Consider the following example:
>>>>>> 
>>>>>>> if(READ_ONCE(x))
>>>>>>> 	return 42;
>>>>>>> 
>>>>>>> 	WRITE_ONCE(y, 42);
>>>>>>> 
>>>>>>> 	return 21;
>>>>>> 
>>>>>> The read event determines whether the write event will be executed "at
>>>>>> all" - as per the current definition - but the formal LKMM does not
>>>>>> recognize this as a control dependency.
>>>>>> 
>>>>>> Introduce a new defintion which includes the requirement for the second
>>>>>> memory access event to syntactically lie within the arm of a non-loop
>>>>>> conditional.
>>>>>> 
>>>>>> Link: https://lore.kernel.org/all/20220615114330.2573952-1-paul.heidekrueger@xxxxxxxxx/
>>>>>> Cc: Marco Elver <elver@xxxxxxxxxx>
>>>>>> Cc: Charalampos Mainas <charalampos.mainas@xxxxxxxxx>
>>>>>> Cc: Pramod Bhatotia <pramod.bhatotia@xxxxxxxxx>
>>>>>> Cc: Soham Chakraborty <s.s.chakraborty@xxxxxxxxxx>
>>>>>> Cc: Martin Fink <martin.fink@xxxxxxxxx>
>>>>>> Signed-off-by: Paul Heidekrüger <paul.heidekrueger@xxxxxxxxx>
>>>>>> Co-developed-by: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
>>>>>> ---
>>>>>> 
>>>>>> @Alan:
>>>>>> 
>>>>>> Since I got it wrong the last time, I'm adding you as a co-developer after my
>>>>>> SOB. I'm sorry if this creates extra work on your side due to you having to
>>>>>> resubmit the patch now with your SOB if I understand correctly, but since it's
>>>>>> based on your wording from the other thread, I definitely wanted to give you
>>>>>> credit.
>>>>>> 
>>>>>> tools/memory-model/Documentation/explanation.txt | 7 ++++---
>>>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>>>>> 
>>>>>> diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt
>>>>>> index ee819a402b69..0bca50cac5f4 100644
>>>>>> --- a/tools/memory-model/Documentation/explanation.txt
>>>>>> +++ b/tools/memory-model/Documentation/explanation.txt
>>>>>> @@ -464,9 +464,10 @@ to address dependencies, since the address of a location accessed
>>>>>> through a pointer will depend on the value read earlier from that
>>>>>> pointer.
>>>>>> 
>>>>>> -Finally, a read event and another memory access event are linked by a
>>>>>> -control dependency if the value obtained by the read affects whether
>>>>>> -the second event is executed at all.  Simple example:
>>>>>> +Finally, a read event X and another memory access event Y are linked by
>>>>>> +a control dependency if Y syntactically lies within an arm of an if,
>>>>>> +else or switch statement and the condition guarding Y is either data or
>>>>>> +address-dependent on X.  Simple example:
>> 
>> Thank you both for commenting!
>> 
>>> "if, else or switch" should be just "if or switch".  In C there is no 
>>> such thing as an "else" statement; an "else" clause is merely part of 
>>> an "if" statement.  In fact, maybe "body" would be more appropriate than 
>>> "arm", because "switch" statements don't have arms -- they have cases.
>> 
>> Right. What do you think of "branch"? "Body" to me suggests that there's
>> only one and therefore that the else clause isn't included.
>> 
>> Would it be fair to say that switch statements have branches? I guess
>> because switch statements are a convenient way of writing goto's, i.e.
>> jumps, it's a stretch and basically the same as saying "arm"?
>> 
>> Maybe we can avoid the arm / case clash by just having a definition for if
>> statements and appending something like "similarly for switch statements"?
> 
> That sounds good.
> 
>>>>> 'conditioning guarding Y' sounds confusing to me as it implies to me that the
>>>>> condition's evaluation depends on Y. I much prefer Alan's wording from the
>>>>> linked post saying something like 'the branch condition is data or address
>>>>> dependent on X, and Y lies in one of the arms'.
>>>>> 
>>>>> I have to ask though, why doesn't this imply that the second instruction never
>>>>> executes at all? I believe that would break the MP-pattern if it were not true.
>>>> 
>>>> About my last statement, I believe your patch does not disagree with the
>>>> correctness of the earlier text but just wants to improve it. If that's case
>>>> then that's fine.
>>> 
>>> The biggest difference between the original text and Paul's suggested 
>>> update is that the new text makes clear that Y has to lie within the 
>>> body of the "if" or "switch" statement.  If Y follows the end of the 
>>> if/else, as in the example at the top of this email, then it does have 
>>> not a control dependency on X (at least, not via that if/else), even 
>>> though the value read by X does determine whether or not Y will execute.
>>> 
>>> [It has to be said that this illustrates a big weakness of the LKMM: It 
>>> isn't cognizant of "goto"s or "return"s.  This naturally derives from 
>>> limitations of the herd tool, but the situation could be improved.  So 
>>> for instance, I don't think it would cause trouble to say that in:
>>> 
>>> 	if (READ_ONCE(x) == 0)
>>> 		return;
>>> 	WRITE_ONCE(y, 5);
>>> 
>>> there really is a control dependence from x to y, even though the 
>>> WRITE_ONCE is outside the body of the "if" statement.  Certainly the 
>>> compiler can't reorder the write before the read.  But AFAIK there's no 
>>> way to include a "return" statement in a litmus test for herd.  Or a 
>>> subroutine definition, for that matter.]
>>> 
>>> I agree that "condition guarding Y" is somewhat awkward.  "the 
>>> condition of the if (or the expression of the switch)" might be better, 
>>> even though it is somewhat awkward as well.  At least it's more 
>>> explicit.
>> 
>> Maybe we can reuse the wording from the data and address dependency
>> definition here and say "affects"?
>> 
>> Putting it all together:
>> 
>>> Finally, a read event X and another memory access event Y are linked by a
>>> control dependency if Y syntactically lies within a branch of an if or
>>> switch statement and X affects the evaluation of that statement's
>>> condition via a data or address dependency.
>> 
>> Alternatively without the arm / case clash:
>> 
>>> Finally, a read event X and another memory access event Y are linked by a
>>> control dependency if Y syntactically lies within an arm of an if
>>> statement and X affects the evaluation of the if condition via a data or
>>> address dependency.  Similarly for switch statements.
>> 
>> What do you think?
> 
> I like the second one.  How about combining the last two sentences?  
> 
> 	... via a data or address dependency (or similarly for a switch 
> 	statement).

Yes, sounds good!

> Now I suppose someone will pipe up and ask about the conditional 
> expressions in "for", "while" and "do" statements...  :-)

Happy to have obliged :-)
https://lore.kernel.org/all/20F4C097-24B4-416B-95EE-AC11F5952B44@xxxxxxxxx/

Do you think the text should explicitly address control dependencies in the
context of loops as well? If yes, would it be a separate patch, or would it
make sense to combine it with this one?

Many thanks,
Paul