On Nov 11, 2024, at 17:30, Nan Xiao <xiaonan830818@xxxxxxxxx> wrote: > > Hi Alan and Akira, > > Thanks very much for your answers! I have another question from > "4.3.4.2 A Volatile Solution". In this section, it mentions: > > 1. Implementations are forbidden from tearing an aligned volatile > access when machine instructions of that access’s size and type are > available. ...... > > Then later it also mentions: > > To summarize, the volatile keyword can prevent load tearing and > store tearing in cases where the loads and stores are machine-sized > and properly aligned. > > TBH, I am a little confused. Does volatile prevent load tearing and > store tearing for aligned and machine-sized data type only? Or as long > as the data type has related machine instructions, no matter whether > its data type is machine-sized or not, volatile can also prevent load > tearing and store tearing? Machine-sized means smaller than or equal to sizeof(long). Take a look at the implementation of WRITE_ONCE or READ_ONCE of linux. > > Thanks very much in advance! > > > Best Regards > Nan Xiao > > On Mon, Nov 11, 2024 at 4:39 PM Akira Yokosawa <akiyks@xxxxxxxxx> wrote: >> >> On Mon, 11 Nov 2024 15:45:22 +0800, Nan Xiao wrote: >>> Hello, >>> >>> Greetings from me! >> >> Hi! >> >>> >>> I am reading "4.3.4.2 A Volatile Solution" of perfbook, and come >>> across following summary: >>> >>> To summarize, the volatile keyword can prevent load tearing and >>> store tearing in cases where the loads and stores are machine-sized >>> and properly aligned. It can also prevent load fusing, store fusing, >>> invented loads, and invented stores. ... >>> >>> At first I thought it means accessing volatile, aligned and >>> machine-sized data is atomic operation, so I wrote a small test >>> program to test on a "64-bit" Linux server: >>> >>> #include <pthread.h> >>> #include <stdio.h> >>> #include <stdatomic.h> >>> #include <stdint.h> >>> >>> >>> volatile uint64_t sum; >>> atomic_ullong atomic_sum; >>> >>> >>> void *thread(void *arg) >>> { >>> >>> for (int i = 0; i < 100000; i++) >>> >>> { >>> >>> sum++; >>> >>> atomic_fetch_add(&atomic_sum, 1); >>> >>> } >>> >>> return NULL; >>> >>> } >>> >>> >>> int main() >>> { >>> >>> pthread_t tid[4]; >>> >>> for (int i = 0; i < sizeof(tid) / sizeof(tid[0]); i++) >>> >>> { >>> >>> pthread_create(&tid[i], NULL, thread, NULL); >>> >>> } >>> >>> >>> for (int i = 0; i < sizeof(tid) / sizeof(tid[0]); i++) >>> >>> { >>> >>> pthread_join(tid[i], NULL); >>> >>> } >>> >>> >>> printf("sum=%llu,atomic_sum=%llu\n", sum, atomic_sum); >>> >>> return 0; >>> >>> } >>> >>> But the result seems not: >>> >>> $ gcc -pthread -O3 parallel.c -o parallel >>> $ ./parallel >>> sum=221785,atomic_sum=400000 >>> $ gcc --version >>> gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22) >>> Copyright (C) 2018 Free Software Foundation, Inc. >>> This is free software; see the source for copying conditions. There is NO >>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. >>> >>> So my understanding is: volatile doesn't guarantee the atomic >>> operation for aligned, machine-sized data, and we can only use >>> atomic_xxx data types and related functions to guarantee atomic >>> operations. Is my understanding correct? Or I misunderstood volatile? >>> Thanks very much in advance! >> >> Your code above does mostly the same things as Listings 5.1 and 5.2 in >> Section 5.1 "Why Isn’t Concurrent Counting Trivial?". >> >> "Atomic (or volatile) accesses" and "atomic increment operations" >> are quite different. >> >> Please read on! >> >> Thanks, Akira >> >>> >>> Best Regards >>> Nan Xiao >> >