On 2017/05/14 17:31:04 -0700, Paul E. McKenney wrote: > On Mon, May 15, 2017 at 12:15:34AM +0900, Akira Yokosawa wrote: >> On 2017/05/13 21:20:23 -0700, Paul E. McKenney wrote: >>> On Sun, May 14, 2017 at 10:31:51AM +0900, Akira Yokosawa wrote: >>>> On 2017/05/14 9:58, Akira Yokosawa wrote: >>>>> On 2017/05/14 7:56, Paul E. McKenney wrote: >>>>>> On Sat, May 13, 2017 at 11:37:06PM +0900, Akira Yokosawa wrote: >>>>>>> On 2017/05/13 05:45:56 -0700, Paul E. McKenney wrote: >>>>>>>> On Sat, May 13, 2017 at 09:04:38PM +0900, Akira Yokosawa wrote: >>>>>>>>> Hi Jason & Paul, >>>>>>>>> >>>>>>>>> although this has already been applied, I have a comment. >>>>>>>>> >>>>>>>>> On 2017/05/11 23:03:41 +0800, Junchang Wang wrote: >>>>>>>>>> Signed-off-by: Junchang Wang <junchangwang@xxxxxxxxx> >>>>>>>>>> --- >>>>>>>>>> CodeSamples/count/count_stat_eventual.c | 8 ++++---- >>>>>>>>>> 1 file changed, 4 insertions(+), 4 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/CodeSamples/count/count_stat_eventual.c b/CodeSamples/count/count_stat_eventual.c >>>>>>>>>> index 059ab8b..cbde4aa 100644 >>>>>>>>>> --- a/CodeSamples/count/count_stat_eventual.c >>>>>>>>>> +++ b/CodeSamples/count/count_stat_eventual.c >>>>>>>>>> @@ -27,12 +27,12 @@ int stopflag; >>>>>>>>>> >>>>>>>>>> void inc_count(void) >>>>>>>>>> { >>>>>>>>>> - ACCESS_ONCE(__get_thread_var(counter))++; >>>>>>>>>> + READ_ONCE(__get_thread_var(counter))++; >>>>>>>>> >>>>>>>>> This is OK because READ_ONCE() is defined as the same as ACCESS_ONCE() >>>>>>>>> in CodeSamples. However, the definition in the current Linux kernel >>>>>>>>> would not permit this. >>>>>>>>> >>>>>>>>> A read-modify-write access would need both READ_ONCE() and WRITE_ONCE(). >>>>>>>>> However, since "counter" is thread local and updated only by its owner, >>>>>>>>> we don't need READ_ONCE() here. So: >>>>>>>>> >>>>>>>>> + WRITE_ONCE(__get_thread_var(counter), __get_thread_var(counter) + 1); >>>>>>>>> >>>>>>>>> should have been sufficient. >> >> So, I measured the performance after this change. >> Unfortunately, this change doubles the overhead of updater. >> >> By inspecting the generated code, it looks like GCC (v5.4 for x86_64 with -O3 option) >> can't figure out two "__get_thread_var(counter)"s are identical, and uses two >> registers as pointers to access it. __get_thread_var() seems complex enough >> to obfuscate the compiler. >> >> To avoid this performance regression, we can use a temporary pointer as a hint for >> optimization: >> >> void inc_count(void) >> { >> unsigned long *p_cnt = &__get_thread_var(counter); >> >> WRITE_ONCE(*p_cnt, *p_cnt + 1); >> } >> >> Another idea is to restore ACCESS_ONCE(). >> As long as the argument is properly aligned scalar type, ACCESS_ONCE() should be OK. >> >> ACCESS_ONCE(__get_thread_var(counter))++; >> >> will emit both load and store instructions, but in this short function, >> the compiler has no way to optimize away either access even if we don't use >> ACCESS_ONCE(). >> >> Thoughts? > > I am tempted to deprecate ACCESS_ONCE(), but yes, it is hard for the > compiler to optimize. Unless, that is, you allow the compiler to > inline the function. Which is the case by default. > > Getting back to READ_ONCE(), how does the following patch look? So, you've already pushed it. As for the definition of READ_ONCE(), it looks good. You might also want to update the definition of READ_ONCE() at the end of Section 4.2.5. As for the code snippet, I'm anticipating your take on the discussion on Patch 2/2 in another thread. Optimization of inc_count() can wait for the time being. Thanks, Akira > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit 85747990ec809f57d61d30cd27bdc18b58163681 > Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> > Date: Sun May 14 17:26:41 2017 -0700 > > count: Don't in-place increment a READ_ONCE() > > This commit prohibits READ_ONCE(x)++ and fixes the occurrence of this > in count_state_eventual.c. > > Reported-by: Akira Yokosawa <akiyks@xxxxxxxxx> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> > > diff --git a/CodeSamples/api-pthreads/api-pthreads.h b/CodeSamples/api-pthreads/api-pthreads.h > index b12c3e14911c..c6031f7d59b7 100644 > --- a/CodeSamples/api-pthreads/api-pthreads.h > +++ b/CodeSamples/api-pthreads/api-pthreads.h > @@ -127,7 +127,7 @@ static __inline__ int spin_is_locked(spinlock_t *sp) > #define spin_unlock_irqrestore(l, f) do { f = 0; spin_unlock(l); } while (0) > > #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x)) > -#define READ_ONCE(x) ACCESS_ONCE(x) > +#define READ_ONCE(x) ({ typeof(x) ___x = ACCESS_ONCE(x); ___x; }) > #define WRITE_ONCE(x, val) ({ ACCESS_ONCE(x) = (val); }) > #ifndef unlikely > #define unlikely(x) x > diff --git a/CodeSamples/api.h b/CodeSamples/api.h > index a18c70f2346c..3b4bc1629c7a 100644 > --- a/CodeSamples/api.h > +++ b/CodeSamples/api.h > @@ -243,7 +243,7 @@ static __inline__ int spin_is_locked(spinlock_t *sp) > #define spin_unlock_irqrestore(l, f) do { f = 0; spin_unlock(l); } while (0) > > #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x)) > -#define READ_ONCE(x) ACCESS_ONCE(x) > +#define READ_ONCE(x) ({ typeof(x) ___x = ACCESS_ONCE(x); ___x; }) > #define WRITE_ONCE(x, val) ({ ACCESS_ONCE(x) = (val); }) > #ifndef unlikely > #define unlikely(x) x > diff --git a/CodeSamples/count/count_stat_eventual.c b/CodeSamples/count/count_stat_eventual.c > index 2caebfd1fb26..464de30d7aa8 100644 > --- a/CodeSamples/count/count_stat_eventual.c > +++ b/CodeSamples/count/count_stat_eventual.c > @@ -27,7 +27,8 @@ int stopflag; > > void inc_count(void) > { > - READ_ONCE(__get_thread_var(counter))++; > + WRITE_ONCE(__get_thread_var(counter), > + READ_ONCE(__get_thread_var(counter)) + 1); > } > > unsigned long read_count(void) > diff --git a/count/count.tex b/count/count.tex > index 096b53d1d2bb..ae624c650c11 100644 > --- a/count/count.tex > +++ b/count/count.tex > @@ -746,56 +746,58 @@ eventually consistent. > \begin{figure}[tbp] > { \scriptsize > \begin{verbbox} > - 1 DEFINE_PER_THREAD(unsigned long, counter); > - 2 unsigned long global_count; > - 3 int stopflag; > - 4 > - 5 void inc_count(void) > - 6 { > - 7 ACCESS_ONCE(__get_thread_var(counter))++; > - 8 } > - 9 > - 10 unsigned long read_count(void) > - 11 { > - 12 return ACCESS_ONCE(global_count); > - 13 } > - 14 > - 15 void *eventual(void *arg) > - 16 { > - 17 int t; > - 18 int sum; > - 19 > - 20 while (stopflag < 3) { > - 21 sum = 0; > - 22 for_each_thread(t) > - 23 sum += ACCESS_ONCE(per_thread(counter, t)); > - 24 ACCESS_ONCE(global_count) = sum; > - 25 poll(NULL, 0, 1); > - 26 if (stopflag) { > - 27 smp_mb(); > - 28 stopflag++; > - 29 } > - 30 } > - 31 return NULL; > - 32 } > - 33 > - 34 void count_init(void) > - 35 { > - 36 thread_id_t tid; > - 37 > - 38 if (pthread_create(&tid, NULL, eventual, NULL)) { > - 39 perror("count_init:pthread_create"); > - 40 exit(-1); > - 41 } > - 42 } > - 43 > - 44 void count_cleanup(void) > - 45 { > - 46 stopflag = 1; > - 47 while (stopflag < 3) > - 48 poll(NULL, 0, 1); > - 49 smp_mb(); > - 50 } > + 1 DEFINE_PER_THREAD(unsigned long, counter); > + 2 unsigned long global_count; > + 3 int stopflag; > + 4 > + 5 void inc_count(void) > + 6 { > + 7 WRITE_ONCE(__get_thread_var(counter), > + 8 READ_ONCE(__get_thread_var(counter)) + 1); > + 9 } > +10 > +11 unsigned long read_count(void) > +12 { > +13 return READ_ONCE(global_count); > +14 } > +15 > +16 void *eventual(void *arg) > +17 { > +18 int t; > +19 unsigned long sum; > +20 > +21 while (READ_ONCE(stopflag) < 3) { > +22 sum = 0; > +23 for_each_thread(t) > +24 sum += READ_ONCE(per_thread(counter, t)); > +25 WRITE_ONCE(global_count, sum); > +26 poll(NULL, 0, 1); > +27 if (READ_ONCE(stopflag)) { > +28 smp_mb(); > +29 WRITE_ONCE(stopflag, READ_ONCE(stopflag) + 1); > +30 } > +31 } > +32 return NULL; > +33 } > +34 > +35 void count_init(void) > +36 { > +37 thread_id_t tid; > +38 > +39 if (pthread_create(&tid, NULL, eventual, NULL) != 0) { > +40 perror("count_init:pthread_create"); > +41 exit(-1); > +42 } > +43 } > +44 > +45 void count_cleanup(void) > +46 { > +47 WRITE_ONCE(stopflag, 1); > +48 smp_mb(); > +49 while (READ_ONCE(stopflag) < 3) > +50 poll(NULL, 0, 1); > +51 smp_mb(); > +52 } > \end{verbbox} > } > \centering > @@ -811,20 +813,20 @@ Lines~1-2 show the per-thread variable and the global variable that > track the counter's value, and line three shows \co{stopflag} > which is used to coordinate termination (for the case where we want > to terminate the program with an accurate counter value). > -The \co{inc_count()} function shown on lines~5-8 is similar to its > +The \co{inc_count()} function shown on lines~5-9 is similar to its > counterpart in > Figure~\ref{fig:count:Array-Based Per-Thread Statistical Counters}. > -The \co{read_count()} function shown on lines~10-13 simply returns the > +The \co{read_count()} function shown on lines~11-14 simply returns the > value of the \co{global_count} variable. > > -However, the \co{count_init()} function on lines~34-42 > -creates the \co{eventual()} thread shown on lines~15-32, which > +However, the \co{count_init()} function on lines~35-43 > +creates the \co{eventual()} thread shown on lines~16-33, which > cycles through all the threads, > summing the per-thread local \co{counter} and storing the > sum to the \co{global_count} variable. > The \co{eventual()} thread waits an arbitrarily chosen one millisecond > between passes. > -The \co{count_cleanup()} function on lines~44-50 coordinates termination. > +The \co{count_cleanup()} function on lines~45-52 coordinates termination. > > This approach gives extremely fast counter read-out while still > supporting linear counter-update performance. > > -- To unsubscribe from this list: send the line "unsubscribe perfbook" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html