Re: synchronize with a non-atomic flag

Akira Yokosawa <akiyks@xxxxxxxxx> · Fri, 6 Oct 2017 21:03:44 +0900

Hi Yubin,

On 2017/10/06 14:52, Yubin Ruan wrote:
> Hi,
> I saw lots of discussions on the web about possible race when doing
> synchronization between multiple threads/processes with lock or atomic
> operations[1][2]. From my point of view most them are over-worrying.
> But I want to point out some particular issue here to see whether
> anyone have anything to say.
> 
> Imagine two processes communicate using only a uint32_t variable in
> shared memory, like this:
> 
>     // uint32_t variable in shared memory
>     uint32_t flag = 0;
> 
>     //process 1
>     while(1) {
>         if(READ_ONCE(flag) == 0) {
>             do_something();
>             WRITE_ONCE(flag, 1); // let another process to run
>         } else {
>             continue;
>         }
>     }
> 
>     //process 2
>     while(1) {
>         if(READ_ONCE(flag) == 1) {
>             printf("process 2 running...\n");
>             WRITE_ONCE(flag, 0); // let another process to run
>         } else {
>             continue;
>         }
>     }
> 
> On X86 or X64, I expect this code to run correctly, that is, I will
> got the two `printf' to printf one after one.

Well, I see only one printf() above.
Do you mean:

    //process 1
    while(1) {
        if(READ_ONCE(flag) == 0) {
            printf("process 1 running...\n");
            WRITE_ONCE(flag, 1); // let another process to run
        } else {
            continue;
        }
    }

    //process 2
    while(1) {
        if(READ_ONCE(flag) == 1) {
            printf("process 2 running...\n");
            WRITE_ONCE(flag, 0); // let another process to run
        } else {
            continue;
        }
    }

?

Then printf()s can be a problem.
It partially negates your claim 3).
Without using memory barrier, there is no guarantee that the results of
WRITE_ONCE() are visible to the other thread after the printf()'s
memory accesses complete. I/O operations in printf() might make the situation
trickier.

In a more realistic case where you do something meaningful in
do_something() in both threads:

    //process 1
    while(1) {
        if(READ_ONCE(flag) == 0) {
            do_something();
            WRITE_ONCE(flag, 1); // let another process to run
        } else {
            continue;
        }
    }

    //process 2
    while(1) {
        if(READ_ONCE(flag) == 1) {
            do_something();
            WRITE_ONCE(flag, 0); // let another process to run
        } else {
            continue;
        }
    }

and if do_something() uses some shared variables other than "flag",
you need a couple of memory barriers to ensure the ordering of
READ_ONCE(), do_something(), and WRITE_ONCE() something like:

    //process 1
    while(1) {
        if(READ_ONCE(flag) == 0) {
	    smp_rmb();
            do_something();
	    smp_wmb();
            WRITE_ONCE(flag, 1); // let another process to run
        } else {
            continue;
        }
    }

    //process 2
    while(1) {
        if(READ_ONCE(flag) == 1) {
	    smp_rmb();
            do_something();
	    smp_wmb();
            WRITE_ONCE(flag, 0); // let another process to run
        } else {
            continue;
        }
    }

In Linux kernel memory model, you can use acquire/release APIs instead:

    //process 1
    while(1) {
        if(smp_load_acquire(&flag) == 0) {
            do_something();
            smp_store_release(&flag, 1); // let another process to run
        } else {
            continue;
        }
    }

    //process 2
    while(1) {
        if(smp_load_acquire(&flag) == 1) {
            do_something();
            smp_store_release(&flag, 0); // let another process to run
        } else {
            continue;
        }
    }

The intention of the code is easier to see when you use well-defined APIs.
Just my two cents.

              Thanks, Akira

>                                                That is because:
> 
>     1) on X86/X64, load/store on 32-bits variable are atomic
>     2) I use READ_ONCE/WRITE_ONCE to prevent possibly harmful compiler
> optimization on `flag'.
>     3) I use only one variable to communicate between two processes,
> so there is no need for any kind of barrier.
> 
> Does anyone have any objection at that?
> 
> I know using a lock or atomic operation will save me a lot of
> argument, but I think those things are unnecessary at this
> circumstance, and it matter where performance matter, so I am picky
> here...
> 
> Yubin
> 
> [1]: https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
> [2]: https://www.usenix.org/conference/osdi10/ad-hoc-synchronization-considered-harmful
> --
> To unsubscribe from this list: send the line "unsubscribe perfbook" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe perfbook" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html