Re: A question about "4.3.4.2 A Volatile Solution" of perfbook

Alan Huang <mmpgouride@xxxxxxxxx> · Mon, 11 Nov 2024 17:38:33 +0800

On Nov 11, 2024, at 17:30, Nan Xiao <xiaonan830818@xxxxxxxxx> wrote:
> 
> Hi Alan and Akira,
> 
> Thanks very much for your answers! I have another question from
> "4.3.4.2 A Volatile Solution". In this section, it mentions:
> 
>   1. Implementations are forbidden from tearing an aligned volatile
> access when machine instructions of that access’s size and type are
> available. ......
> 
> Then later it also mentions:
> 
>   To summarize, the volatile keyword can prevent load tearing and
> store tearing in cases where the loads and stores are machine-sized
> and properly aligned.
> 
> TBH, I am a little confused. Does volatile prevent load tearing and
> store tearing for aligned and machine-sized data type only? Or as long
> as the data type has related machine instructions, no matter whether
> its data type is machine-sized or not, volatile can also prevent load
> tearing and store tearing?

Machine-sized means smaller than or equal to sizeof(long).

Take a look at the implementation of WRITE_ONCE or READ_ONCE of linux.

> 
> Thanks very much in advance!
> 
> 
> Best Regards
> Nan Xiao
> 
> On Mon, Nov 11, 2024 at 4:39 PM Akira Yokosawa <akiyks@xxxxxxxxx> wrote:
>> 
>> On Mon, 11 Nov 2024 15:45:22 +0800, Nan Xiao wrote:
>>> Hello,
>>> 
>>> Greetings from me!
>> 
>> Hi!
>> 
>>> 
>>> I am reading "4.3.4.2 A Volatile Solution" of perfbook, and come
>>> across following summary:
>>> 
>>>    To summarize, the volatile keyword can prevent load tearing and
>>> store tearing in cases where the loads and stores are machine-sized
>>> and properly aligned. It can also prevent load fusing, store fusing,
>>> invented loads, and invented stores. ...
>>> 
>>> At first I thought it means accessing volatile, aligned and
>>> machine-sized data is atomic operation, so I wrote a small test
>>> program to test on a "64-bit" Linux server:
>>> 
>>> #include <pthread.h>
>>> #include <stdio.h>
>>> #include <stdatomic.h>
>>> #include <stdint.h>
>>> 
>>> 
>>> volatile uint64_t sum;
>>> atomic_ullong atomic_sum;
>>> 
>>> 
>>> void *thread(void *arg)
>>> {
>>> 
>>>    for (int i = 0; i < 100000; i++)
>>> 
>>>    {
>>> 
>>>        sum++;
>>> 
>>>        atomic_fetch_add(&atomic_sum, 1);
>>> 
>>>    }
>>> 
>>>    return NULL;
>>> 
>>> }
>>> 
>>> 
>>> int main()
>>> {
>>> 
>>>    pthread_t tid[4];
>>> 
>>>    for (int i = 0; i < sizeof(tid) / sizeof(tid[0]); i++)
>>> 
>>>    {
>>> 
>>>        pthread_create(&tid[i], NULL, thread, NULL);
>>> 
>>>    }
>>> 
>>> 
>>>    for (int i = 0; i < sizeof(tid) / sizeof(tid[0]); i++)
>>> 
>>>    {
>>> 
>>>        pthread_join(tid[i], NULL);
>>> 
>>>    }
>>> 
>>> 
>>>    printf("sum=%llu,atomic_sum=%llu\n", sum, atomic_sum);
>>> 
>>>    return 0;
>>> 
>>> }
>>> 
>>> But the result seems not:
>>> 
>>> $ gcc -pthread -O3 parallel.c -o parallel
>>> $ ./parallel
>>> sum=221785,atomic_sum=400000
>>> $ gcc --version
>>> gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22)
>>> Copyright (C) 2018 Free Software Foundation, Inc.
>>> This is free software; see the source for copying conditions.  There is NO
>>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>>> 
>>> So my understanding is: volatile doesn't guarantee the atomic
>>> operation for aligned, machine-sized data, and we can only use
>>> atomic_xxx data types and related functions to guarantee atomic
>>> operations. Is my understanding correct? Or I misunderstood volatile?
>>> Thanks very much in advance!
>> 
>> Your code above does mostly the same things as Listings 5.1 and 5.2 in
>> Section 5.1 "Why Isn’t Concurrent Counting Trivial?".
>> 
>> "Atomic (or volatile) accesses" and "atomic increment operations"
>> are quite different.
>> 
>> Please read on!
>> 
>>        Thanks, Akira
>> 
>>> 
>>> Best Regards
>>> Nan Xiao
>> 
>