Re: A question about "4.3.4.2 A Volatile Solution" of perfbook

Nan Xiao <xiaonan830818@xxxxxxxxx> · Mon, 11 Nov 2024 17:30:09 +0800

Hi Alan and Akira,

Thanks very much for your answers! I have another question from
"4.3.4.2 A Volatile Solution". In this section, it mentions:

   1. Implementations are forbidden from tearing an aligned volatile
access when machine instructions of that access’s size and type are
available. ......

Then later it also mentions:

   To summarize, the volatile keyword can prevent load tearing and
store tearing in cases where the loads and stores are machine-sized
and properly aligned.

TBH, I am a little confused. Does volatile prevent load tearing and
store tearing for aligned and machine-sized data type only? Or as long
as the data type has related machine instructions, no matter whether
its data type is machine-sized or not, volatile can also prevent load
tearing and store tearing?

Thanks very much in advance!

Best Regards
Nan Xiao

On Mon, Nov 11, 2024 at 4:39 PM Akira Yokosawa <akiyks@xxxxxxxxx> wrote:
>
> On Mon, 11 Nov 2024 15:45:22 +0800, Nan Xiao wrote:
> > Hello,
> >
> > Greetings from me!
>
> Hi!
>
> >
> > I am reading "4.3.4.2 A Volatile Solution" of perfbook, and come
> > across following summary:
> >
> >     To summarize, the volatile keyword can prevent load tearing and
> > store tearing in cases where the loads and stores are machine-sized
> > and properly aligned. It can also prevent load fusing, store fusing,
> > invented loads, and invented stores. ...
> >
> > At first I thought it means accessing volatile, aligned and
> > machine-sized data is atomic operation, so I wrote a small test
> > program to test on a "64-bit" Linux server:
> >
> > #include <pthread.h>
> > #include <stdio.h>
> > #include <stdatomic.h>
> > #include <stdint.h>
> >
> >
> > volatile uint64_t sum;
> > atomic_ullong atomic_sum;
> >
> >
> > void *thread(void *arg)
> > {
> >
> >     for (int i = 0; i < 100000; i++)
> >
> >     {
> >
> >         sum++;
> >
> >         atomic_fetch_add(&atomic_sum, 1);
> >
> >     }
> >
> >     return NULL;
> >
> > }
> >
> >
> > int main()
> > {
> >
> >     pthread_t tid[4];
> >
> >     for (int i = 0; i < sizeof(tid) / sizeof(tid[0]); i++)
> >
> >     {
> >
> >         pthread_create(&tid[i], NULL, thread, NULL);
> >
> >     }
> >
> >
> >     for (int i = 0; i < sizeof(tid) / sizeof(tid[0]); i++)
> >
> >     {
> >
> >         pthread_join(tid[i], NULL);
> >
> >     }
> >
> >
> >     printf("sum=%llu,atomic_sum=%llu\n", sum, atomic_sum);
> >
> >     return 0;
> >
> > }
> >
> > But the result seems not:
> >
> > $ gcc -pthread -O3 parallel.c -o parallel
> > $ ./parallel
> > sum=221785,atomic_sum=400000
> > $ gcc --version
> > gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22)
> > Copyright (C) 2018 Free Software Foundation, Inc.
> > This is free software; see the source for copying conditions.  There is NO
> > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> >
> > So my understanding is: volatile doesn't guarantee the atomic
> > operation for aligned, machine-sized data, and we can only use
> > atomic_xxx data types and related functions to guarantee atomic
> > operations. Is my understanding correct? Or I misunderstood volatile?
> > Thanks very much in advance!
>
> Your code above does mostly the same things as Listings 5.1 and 5.2 in
> Section 5.1 "Why Isn’t Concurrent Counting Trivial?".
>
> "Atomic (or volatile) accesses" and "atomic increment operations"
> are quite different.
>
> Please read on!
>
>         Thanks, Akira
>
> >
> > Best Regards
> > Nan Xiao
>