Re: synchronize with a non-atomic flag

Akira Yokosawa <akiyks@xxxxxxxxx> · Sat, 7 Oct 2017 20:40:15 +0900

On 2017/10/07 15:04:50 +0800, Yubin Ruan wrote:
> Thanks Paul and Akira,
> 
> 2017-10-07 3:12 GMT+08:00 Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>:
>> On Fri, Oct 06, 2017 at 08:35:00PM +0800, Yubin Ruan wrote:
>>> 2017-10-06 20:03 GMT+08:00 Akira Yokosawa <akiyks@xxxxxxxxx>:
>>>> On 2017/10/06 14:52, Yubin Ruan wrote:
>>
>> [ . . . ]
>>
>>>> I/O operations in printf() might make the situation trickier.
>>>
>>> printf(3) is claimed to be thread-safe, so I think this issue will not
>>> concern us.
> 
> so now I can pretty much confirm this.

Yes. Now I recognize that POSIX.1c requires stdio functions to be MT-safe.
By MT-safe, one call to printf() won't be disturbed by other racy function
calls involving output to stdout.

I was disturbed by the following description of MT-Safe in attributes(7)
man page:

    Being MT-Safe does not imply a function is atomic, nor  that  it
    uses  any of the memory synchronization mechanisms POSIX exposes
    to users. [...]

Excerpt from a white paper at http://www.unix.org/whitepapers/reentrant.html:

    The POSIX.1 and C-language functions that operate on character streams
    (represented by pointers to objects of type FILE) are required by POSIX.1c
    to be implemented in such a way that reentrancy is achieved (see ISO/IEC
    9945:1-1996, §8.2). This requirement has a drawback; it imposes
    substantial performance penalties because of the synchronization that
    must be built into the implementations of the functions for the sake of
    reentrancy. [...]

Yubin, thank you for giving me the chance to realize this.

> 
>>>> In a more realistic case where you do something meaningful in
>>>> do_something() in both threads:
>>>>
>>>>     //process 1
>>>>     while(1) {
>>>>         if(READ_ONCE(flag) == 0) {
>>>>             do_something();
>>>>             WRITE_ONCE(flag, 1); // let another process to run
>>>>         } else {
>>>>             continue;
>>>>         }
>>>>     }
>>>>
>>>>     //process 2
>>>>     while(1) {
>>>>         if(READ_ONCE(flag) == 1) {
>>>>             do_something();
>>>>             WRITE_ONCE(flag, 0); // let another process to run
>>>>         } else {
>>>>             continue;
>>>>         }
>>>>     }
>>
>> In the Linux kernel, there is control-dependency ordering between
>> the READ_ONCE(flag) and any stores in either the then-clause or
>> the else-clause.  However, I see no ordering between do_something()
>> and the WRITE_ONCE().
> 
> I was not aware of the "control-dependency" ordering issue in the
> Linux kernel before. Is it true for all architectures?
> 
> But anyway, the ordering between READ_ONCE(flag) and any subsequent
> stores are guaranteed on X86/X64, so we didn't need any memory barrier
> here.
> 
>>>> and if do_something() uses some shared variables other than "flag",
>>>> you need a couple of memory barriers to ensure the ordering of
>>>> READ_ONCE(), do_something(), and WRITE_ONCE() something like:
>>>>
>>>>     //process 1
>>>>     while(1) {
>>>>         if(READ_ONCE(flag) == 0) {
>>>>             smp_rmb();
>>>>             do_something();
>>>>             smp_wmb();
>>>>             WRITE_ONCE(flag, 1); // let another process to run
>>>>         } else {
>>>>             continue;
>>>>         }
>>>>     }
>>>>
>>>>     //process 2
>>>>     while(1) {
>>>>         if(READ_ONCE(flag) == 1) {
>>>>             smp_rmb();
>>>>             do_something();
>>>>             smp_wmb();
>>>>             WRITE_ONCE(flag, 0); // let another process to run
>>>>         } else {
>>>>             continue;
>>>>         }
>>>>     }
>>
>> Here, the control dependency again orders the READ_ONCE() against later
>> stores, and the smp_rmb() orders the READ_ONCE() against any later
>> loads.
> 
> Understand and agree.
> 
>> The smp_wmb() orders do_something()'s writes (but not its reads!)
>> against the WRITE_ONCE().
> 
> Understand and agree. But do we really need the smp_rmb() on X86/64?
> As far as I know, on X86/64 stores are not reordered with other
> stores...[1]
> 
>>>> In Linux kernel memory model, you can use acquire/release APIs instead:
>>>>
>>>>     //process 1
>>>>     while(1) {
>>>>         if(smp_load_acquire(&flag) == 0) {
>>>>             do_something();
>>>>             smp_store_release(&flag, 1); // let another process to run
>>>>         } else {
>>>>             continue;
>>>>         }
>>>>     }
>>>>
>>>>     //process 2
>>>>     while(1) {
>>>>         if(smp_load_acquire(&flag) == 1) {
>>>>             do_something();
>>>>             smp_store_release(&flag, 0); // let another process to run
>>>>         } else {
>>>>             continue;
>>>>         }
>>>>     }
>>
>> This is probably the most straightforward of the above approaches.
>>
>> That said, if you really want a series of things to execute in a
>> particular order, why not just put them into the same process?
> 
> I will be very happy if I can. But sometimes we just have to deal with
> issues concerning multiple processes...
> 
> [1]: One thing I got a little confused is that some people claim that
> on x86/64 there are several guarantees[2]:
>     1) Loads are not reordered with other loads.
>     2) Stores are not reordered with other stores.
>     3) Stores are not reordered with older loads.
> (note that Loads may still be reordered with older stores to different
> locations)
> 
> So, if 1) and 2) are true, why do we have "lfence" and "sfence"
> instructions at all?

Excerpt from Intel 64 and IA-32 Architectures Developer's Manual: Vol. 3A
Section 8.2.5

    [...] Despite the fact that Pentium 4, Intel Xeon, and P6 family
    processors support processor ordering, Intel does not guarantee
    that future processors will support this model. To make software
    portable to future processors, it is recommended that operating systems
    provide critical region and resource control constructs and API's
    (application program interfaces) based on I/O, locking, and/or
    serializing instructions be used to synchronize access to shared
    areas of memory in multiple-processor systems. [...]

So the answer seems "to make software portable to future processors".

        Thanks, Akira

> 
> [2]: I found those claims here, but not so sure whether or not they
> are true: https://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/
> 

--
To unsubscribe from this list: send the line "unsubscribe perfbook" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html