Re: synchronize with a non-atomic flag

Akira Yokosawa <akiyks@xxxxxxxxx> · Sat, 7 Oct 2017 23:36:45 +0900

On 2017/10/07 21:43:53 +0800, Yubin Ruan wrote:
> 2017-10-07 19:40 GMT+08:00 Akira Yokosawa <akiyks@xxxxxxxxx>:
>> On 2017/10/07 15:04:50 +0800, Yubin Ruan wrote:
>>> Thanks Paul and Akira,
>>>
>>> 2017-10-07 3:12 GMT+08:00 Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>:
>>>> On Fri, Oct 06, 2017 at 08:35:00PM +0800, Yubin Ruan wrote:
>>>>> 2017-10-06 20:03 GMT+08:00 Akira Yokosawa <akiyks@xxxxxxxxx>:
>>>>>> On 2017/10/06 14:52, Yubin Ruan wrote:
>>>>
>>>> [ . . . ]
>>>>
>>>>>> I/O operations in printf() might make the situation trickier.
>>>>>
>>>>> printf(3) is claimed to be thread-safe, so I think this issue will not
>>>>> concern us.
>>>
>>> so now I can pretty much confirm this.
>>
>> Yes. Now I recognize that POSIX.1c requires stdio functions to be MT-safe.
>> By MT-safe, one call to printf() won't be disturbed by other racy function
>> calls involving output to stdout.
>>
>> I was disturbed by the following description of MT-Safe in attributes(7)
>> man page:
>>
>>     Being MT-Safe does not imply a function is atomic, nor  that  it
>>     uses  any of the memory synchronization mechanisms POSIX exposes
>>     to users. [...]
>>
>> Excerpt from a white paper at http://www.unix.org/whitepapers/reentrant.html:
>>
>>     The POSIX.1 and C-language functions that operate on character streams
>>     (represented by pointers to objects of type FILE) are required by POSIX.1c
>>     to be implemented in such a way that reentrancy is achieved (see ISO/IEC
>>     9945:1-1996, §8.2). This requirement has a drawback; it imposes
>>     substantial performance penalties because of the synchronization that
>>     must be built into the implementations of the functions for the sake of
>>     reentrancy. [...]
>>
>> Yubin, thank you for giving me the chance to realize this.
>>
>>>
>>>>>> In a more realistic case where you do something meaningful in
>>>>>> do_something() in both threads:
>>>>>>
>>>>>>     //process 1
>>>>>>     while(1) {
>>>>>>         if(READ_ONCE(flag) == 0) {
>>>>>>             do_something();
>>>>>>             WRITE_ONCE(flag, 1); // let another process to run
>>>>>>         } else {
>>>>>>             continue;
>>>>>>         }
>>>>>>     }
>>>>>>
>>>>>>     //process 2
>>>>>>     while(1) {
>>>>>>         if(READ_ONCE(flag) == 1) {
>>>>>>             do_something();
>>>>>>             WRITE_ONCE(flag, 0); // let another process to run
>>>>>>         } else {
>>>>>>             continue;
>>>>>>         }
>>>>>>     }
>>>>
>>>> In the Linux kernel, there is control-dependency ordering between
>>>> the READ_ONCE(flag) and any stores in either the then-clause or
>>>> the else-clause.  However, I see no ordering between do_something()
>>>> and the WRITE_ONCE().
>>>
>>> I was not aware of the "control-dependency" ordering issue in the
>>> Linux kernel before. Is it true for all architectures?
>>>
>>> But anyway, the ordering between READ_ONCE(flag) and any subsequent
>>> stores are guaranteed on X86/X64, so we didn't need any memory barrier
>>> here.
>>>
>>>>>> and if do_something() uses some shared variables other than "flag",
>>>>>> you need a couple of memory barriers to ensure the ordering of
>>>>>> READ_ONCE(), do_something(), and WRITE_ONCE() something like:
>>>>>>
>>>>>>     //process 1
>>>>>>     while(1) {
>>>>>>         if(READ_ONCE(flag) == 0) {
>>>>>>             smp_rmb();
>>>>>>             do_something();
>>>>>>             smp_wmb();
>>>>>>             WRITE_ONCE(flag, 1); // let another process to run
>>>>>>         } else {
>>>>>>             continue;
>>>>>>         }
>>>>>>     }
>>>>>>
>>>>>>     //process 2
>>>>>>     while(1) {
>>>>>>         if(READ_ONCE(flag) == 1) {
>>>>>>             smp_rmb();
>>>>>>             do_something();
>>>>>>             smp_wmb();
>>>>>>             WRITE_ONCE(flag, 0); // let another process to run
>>>>>>         } else {
>>>>>>             continue;
>>>>>>         }
>>>>>>     }
>>>>
>>>> Here, the control dependency again orders the READ_ONCE() against later
>>>> stores, and the smp_rmb() orders the READ_ONCE() against any later
>>>> loads.
>>>
>>> Understand and agree.
>>>
>>>> The smp_wmb() orders do_something()'s writes (but not its reads!)
>>>> against the WRITE_ONCE().
>>>
>>> Understand and agree. But do we really need the smp_rmb() on X86/64?
>>> As far as I know, on X86/64 stores are not reordered with other
>>> stores...[1]
>>>
>>>>>> In Linux kernel memory model, you can use acquire/release APIs instead:
>>>>>>
>>>>>>     //process 1
>>>>>>     while(1) {
>>>>>>         if(smp_load_acquire(&flag) == 0) {
>>>>>>             do_something();
>>>>>>             smp_store_release(&flag, 1); // let another process to run
>>>>>>         } else {
>>>>>>             continue;
>>>>>>         }
>>>>>>     }
>>>>>>
>>>>>>     //process 2
>>>>>>     while(1) {
>>>>>>         if(smp_load_acquire(&flag) == 1) {
>>>>>>             do_something();
>>>>>>             smp_store_release(&flag, 0); // let another process to run
>>>>>>         } else {
>>>>>>             continue;
>>>>>>         }
>>>>>>     }
>>>>
>>>> This is probably the most straightforward of the above approaches.
>>>>
>>>> That said, if you really want a series of things to execute in a
>>>> particular order, why not just put them into the same process?
>>>
>>> I will be very happy if I can. But sometimes we just have to deal with
>>> issues concerning multiple processes...
>>>
>>> [1]: One thing I got a little confused is that some people claim that
>>> on x86/64 there are several guarantees[2]:
>>>     1) Loads are not reordered with other loads.
>>>     2) Stores are not reordered with other stores.
>>>     3) Stores are not reordered with older loads.
>>> (note that Loads may still be reordered with older stores to different
>>> locations)
>>>
>>> So, if 1) and 2) are true, why do we have "lfence" and "sfence"
>>> instructions at all?
>>
>> Excerpt from Intel 64 and IA-32 Architectures Developer's Manual: Vol. 3A
>> Section 8.2.5
>>
>>     [...] Despite the fact that Pentium 4, Intel Xeon, and P6 family
>>     processors support processor ordering, Intel does not guarantee
>>     that future processors will support this model. To make software
>>     portable to future processors, it is recommended that operating systems
>>     provide critical region and resource control constructs and API's
>>     (application program interfaces) based on I/O, locking, and/or
>>     serializing instructions be used to synchronize access to shared
>>     areas of memory in multiple-processor systems. [...]
>>
>> So the answer seems "to make software portable to future processors".
> 
> Hmm...so currently these instructions are nops effectively?
> 

According to perfbook's Section 14.4.9 "x86" (as of current master),

    However, note that some SSE instructions are weakly ordered (clflush
    and non-temporal move instructions [Int04a]). CPUs that have SSE can
    use mfence for smp_mb(), lfence for smp_rmb(), and sfence for smp_wmb().

So as long as you don't use SSE extensions, I guess they are effectively
nops. But I'm not sure.

Paul, could you enlighten us?

Akira

> Yubin
> 
>>
>>>
>>> [2]: I found those claims here, but not so sure whether or not they
>>> are true: https://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/
>>>
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe perfbook" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html