Hi Alan and Akira, Thanks very much for your answers! I have another question from "4.3.4.2 A Volatile Solution". In this section, it mentions: 1. Implementations are forbidden from tearing an aligned volatile access when machine instructions of that access’s size and type are available. ...... Then later it also mentions: To summarize, the volatile keyword can prevent load tearing and store tearing in cases where the loads and stores are machine-sized and properly aligned. TBH, I am a little confused. Does volatile prevent load tearing and store tearing for aligned and machine-sized data type only? Or as long as the data type has related machine instructions, no matter whether its data type is machine-sized or not, volatile can also prevent load tearing and store tearing? Thanks very much in advance! Best Regards Nan Xiao On Mon, Nov 11, 2024 at 4:39 PM Akira Yokosawa <akiyks@xxxxxxxxx> wrote: > > On Mon, 11 Nov 2024 15:45:22 +0800, Nan Xiao wrote: > > Hello, > > > > Greetings from me! > > Hi! > > > > > I am reading "4.3.4.2 A Volatile Solution" of perfbook, and come > > across following summary: > > > > To summarize, the volatile keyword can prevent load tearing and > > store tearing in cases where the loads and stores are machine-sized > > and properly aligned. It can also prevent load fusing, store fusing, > > invented loads, and invented stores. ... > > > > At first I thought it means accessing volatile, aligned and > > machine-sized data is atomic operation, so I wrote a small test > > program to test on a "64-bit" Linux server: > > > > #include <pthread.h> > > #include <stdio.h> > > #include <stdatomic.h> > > #include <stdint.h> > > > > > > volatile uint64_t sum; > > atomic_ullong atomic_sum; > > > > > > void *thread(void *arg) > > { > > > > for (int i = 0; i < 100000; i++) > > > > { > > > > sum++; > > > > atomic_fetch_add(&atomic_sum, 1); > > > > } > > > > return NULL; > > > > } > > > > > > int main() > > { > > > > pthread_t tid[4]; > > > > for (int i = 0; i < sizeof(tid) / sizeof(tid[0]); i++) > > > > { > > > > pthread_create(&tid[i], NULL, thread, NULL); > > > > } > > > > > > for (int i = 0; i < sizeof(tid) / sizeof(tid[0]); i++) > > > > { > > > > pthread_join(tid[i], NULL); > > > > } > > > > > > printf("sum=%llu,atomic_sum=%llu\n", sum, atomic_sum); > > > > return 0; > > > > } > > > > But the result seems not: > > > > $ gcc -pthread -O3 parallel.c -o parallel > > $ ./parallel > > sum=221785,atomic_sum=400000 > > $ gcc --version > > gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22) > > Copyright (C) 2018 Free Software Foundation, Inc. > > This is free software; see the source for copying conditions. There is NO > > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > > > So my understanding is: volatile doesn't guarantee the atomic > > operation for aligned, machine-sized data, and we can only use > > atomic_xxx data types and related functions to guarantee atomic > > operations. Is my understanding correct? Or I misunderstood volatile? > > Thanks very much in advance! > > Your code above does mostly the same things as Listings 5.1 and 5.2 in > Section 5.1 "Why Isn’t Concurrent Counting Trivial?". > > "Atomic (or volatile) accesses" and "atomic increment operations" > are quite different. > > Please read on! > > Thanks, Akira > > > > > Best Regards > > Nan Xiao >