On Mon, May 15, 2017 at 11:35:51PM +0900, Akira Yokosawa wrote: > On 2017/05/14 17:31:04 -0700, Paul E. McKenney wrote: > > On Mon, May 15, 2017 at 12:15:34AM +0900, Akira Yokosawa wrote: > >> On 2017/05/13 21:20:23 -0700, Paul E. McKenney wrote: > >>> On Sun, May 14, 2017 at 10:31:51AM +0900, Akira Yokosawa wrote: > >>>> On 2017/05/14 9:58, Akira Yokosawa wrote: > >>>>> On 2017/05/14 7:56, Paul E. McKenney wrote: > >>>>>> On Sat, May 13, 2017 at 11:37:06PM +0900, Akira Yokosawa wrote: > >>>>>>> On 2017/05/13 05:45:56 -0700, Paul E. McKenney wrote: > >>>>>>>> On Sat, May 13, 2017 at 09:04:38PM +0900, Akira Yokosawa wrote: > >>>>>>>>> Hi Jason & Paul, > >>>>>>>>> > >>>>>>>>> although this has already been applied, I have a comment. > >>>>>>>>> > >>>>>>>>> On 2017/05/11 23:03:41 +0800, Junchang Wang wrote: > >>>>>>>>>> Signed-off-by: Junchang Wang <junchangwang@xxxxxxxxx> > >>>>>>>>>> --- > >>>>>>>>>> CodeSamples/count/count_stat_eventual.c | 8 ++++---- > >>>>>>>>>> 1 file changed, 4 insertions(+), 4 deletions(-) > >>>>>>>>>> > >>>>>>>>>> diff --git a/CodeSamples/count/count_stat_eventual.c b/CodeSamples/count/count_stat_eventual.c > >>>>>>>>>> index 059ab8b..cbde4aa 100644 > >>>>>>>>>> --- a/CodeSamples/count/count_stat_eventual.c > >>>>>>>>>> +++ b/CodeSamples/count/count_stat_eventual.c > >>>>>>>>>> @@ -27,12 +27,12 @@ int stopflag; > >>>>>>>>>> > >>>>>>>>>> void inc_count(void) > >>>>>>>>>> { > >>>>>>>>>> - ACCESS_ONCE(__get_thread_var(counter))++; > >>>>>>>>>> + READ_ONCE(__get_thread_var(counter))++; > >>>>>>>>> > >>>>>>>>> This is OK because READ_ONCE() is defined as the same as ACCESS_ONCE() > >>>>>>>>> in CodeSamples. However, the definition in the current Linux kernel > >>>>>>>>> would not permit this. > >>>>>>>>> > >>>>>>>>> A read-modify-write access would need both READ_ONCE() and WRITE_ONCE(). > >>>>>>>>> However, since "counter" is thread local and updated only by its owner, > >>>>>>>>> we don't need READ_ONCE() here. So: > >>>>>>>>> > >>>>>>>>> + WRITE_ONCE(__get_thread_var(counter), __get_thread_var(counter) + 1); > >>>>>>>>> > >>>>>>>>> should have been sufficient. > >> > >> So, I measured the performance after this change. > >> Unfortunately, this change doubles the overhead of updater. > >> > >> By inspecting the generated code, it looks like GCC (v5.4 for x86_64 with -O3 option) > >> can't figure out two "__get_thread_var(counter)"s are identical, and uses two > >> registers as pointers to access it. __get_thread_var() seems complex enough > >> to obfuscate the compiler. > >> > >> To avoid this performance regression, we can use a temporary pointer as a hint for > >> optimization: > >> > >> void inc_count(void) > >> { > >> unsigned long *p_cnt = &__get_thread_var(counter); > >> > >> WRITE_ONCE(*p_cnt, *p_cnt + 1); > >> } > >> > >> Another idea is to restore ACCESS_ONCE(). > >> As long as the argument is properly aligned scalar type, ACCESS_ONCE() should be OK. > >> > >> ACCESS_ONCE(__get_thread_var(counter))++; > >> > >> will emit both load and store instructions, but in this short function, > >> the compiler has no way to optimize away either access even if we don't use > >> ACCESS_ONCE(). > >> > >> Thoughts? > > > > I am tempted to deprecate ACCESS_ONCE(), but yes, it is hard for the > > compiler to optimize. Unless, that is, you allow the compiler to > > inline the function. Which is the case by default. > > > > Getting back to READ_ONCE(), how does the following patch look? > > So, you've already pushed it. Good point -- I should have held it back for review. One of those days. Still, changes can be made as needed. > As for the definition of READ_ONCE(), it looks good. > > You might also want to update the definition of READ_ONCE() at the end of > Section 4.2.5. > > As for the code snippet, I'm anticipating your take on the discussion > on Patch 2/2 in another thread. Will do! Thanx, Paul > Optimization of inc_count() can wait for the time being. > > Thanks, Akira > > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > commit 85747990ec809f57d61d30cd27bdc18b58163681 > > Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> > > Date: Sun May 14 17:26:41 2017 -0700 > > > > count: Don't in-place increment a READ_ONCE() > > > > This commit prohibits READ_ONCE(x)++ and fixes the occurrence of this > > in count_state_eventual.c. > > > > Reported-by: Akira Yokosawa <akiyks@xxxxxxxxx> > > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> > > > > diff --git a/CodeSamples/api-pthreads/api-pthreads.h b/CodeSamples/api-pthreads/api-pthreads.h > > index b12c3e14911c..c6031f7d59b7 100644 > > --- a/CodeSamples/api-pthreads/api-pthreads.h > > +++ b/CodeSamples/api-pthreads/api-pthreads.h > > @@ -127,7 +127,7 @@ static __inline__ int spin_is_locked(spinlock_t *sp) > > #define spin_unlock_irqrestore(l, f) do { f = 0; spin_unlock(l); } while (0) > > > > #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x)) > > -#define READ_ONCE(x) ACCESS_ONCE(x) > > +#define READ_ONCE(x) ({ typeof(x) ___x = ACCESS_ONCE(x); ___x; }) > > #define WRITE_ONCE(x, val) ({ ACCESS_ONCE(x) = (val); }) > > #ifndef unlikely > > #define unlikely(x) x > > diff --git a/CodeSamples/api.h b/CodeSamples/api.h > > index a18c70f2346c..3b4bc1629c7a 100644 > > --- a/CodeSamples/api.h > > +++ b/CodeSamples/api.h > > @@ -243,7 +243,7 @@ static __inline__ int spin_is_locked(spinlock_t *sp) > > #define spin_unlock_irqrestore(l, f) do { f = 0; spin_unlock(l); } while (0) > > > > #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x)) > > -#define READ_ONCE(x) ACCESS_ONCE(x) > > +#define READ_ONCE(x) ({ typeof(x) ___x = ACCESS_ONCE(x); ___x; }) > > #define WRITE_ONCE(x, val) ({ ACCESS_ONCE(x) = (val); }) > > #ifndef unlikely > > #define unlikely(x) x > > diff --git a/CodeSamples/count/count_stat_eventual.c b/CodeSamples/count/count_stat_eventual.c > > index 2caebfd1fb26..464de30d7aa8 100644 > > --- a/CodeSamples/count/count_stat_eventual.c > > +++ b/CodeSamples/count/count_stat_eventual.c > > @@ -27,7 +27,8 @@ int stopflag; > > > > void inc_count(void) > > { > > - READ_ONCE(__get_thread_var(counter))++; > > + WRITE_ONCE(__get_thread_var(counter), > > + READ_ONCE(__get_thread_var(counter)) + 1); > > } > > > > unsigned long read_count(void) > > diff --git a/count/count.tex b/count/count.tex > > index 096b53d1d2bb..ae624c650c11 100644 > > --- a/count/count.tex > > +++ b/count/count.tex > > @@ -746,56 +746,58 @@ eventually consistent. > > \begin{figure}[tbp] > > { \scriptsize > > \begin{verbbox} > > - 1 DEFINE_PER_THREAD(unsigned long, counter); > > - 2 unsigned long global_count; > > - 3 int stopflag; > > - 4 > > - 5 void inc_count(void) > > - 6 { > > - 7 ACCESS_ONCE(__get_thread_var(counter))++; > > - 8 } > > - 9 > > - 10 unsigned long read_count(void) > > - 11 { > > - 12 return ACCESS_ONCE(global_count); > > - 13 } > > - 14 > > - 15 void *eventual(void *arg) > > - 16 { > > - 17 int t; > > - 18 int sum; > > - 19 > > - 20 while (stopflag < 3) { > > - 21 sum = 0; > > - 22 for_each_thread(t) > > - 23 sum += ACCESS_ONCE(per_thread(counter, t)); > > - 24 ACCESS_ONCE(global_count) = sum; > > - 25 poll(NULL, 0, 1); > > - 26 if (stopflag) { > > - 27 smp_mb(); > > - 28 stopflag++; > > - 29 } > > - 30 } > > - 31 return NULL; > > - 32 } > > - 33 > > - 34 void count_init(void) > > - 35 { > > - 36 thread_id_t tid; > > - 37 > > - 38 if (pthread_create(&tid, NULL, eventual, NULL)) { > > - 39 perror("count_init:pthread_create"); > > - 40 exit(-1); > > - 41 } > > - 42 } > > - 43 > > - 44 void count_cleanup(void) > > - 45 { > > - 46 stopflag = 1; > > - 47 while (stopflag < 3) > > - 48 poll(NULL, 0, 1); > > - 49 smp_mb(); > > - 50 } > > + 1 DEFINE_PER_THREAD(unsigned long, counter); > > + 2 unsigned long global_count; > > + 3 int stopflag; > > + 4 > > + 5 void inc_count(void) > > + 6 { > > + 7 WRITE_ONCE(__get_thread_var(counter), > > + 8 READ_ONCE(__get_thread_var(counter)) + 1); > > + 9 } > > +10 > > +11 unsigned long read_count(void) > > +12 { > > +13 return READ_ONCE(global_count); > > +14 } > > +15 > > +16 void *eventual(void *arg) > > +17 { > > +18 int t; > > +19 unsigned long sum; > > +20 > > +21 while (READ_ONCE(stopflag) < 3) { > > +22 sum = 0; > > +23 for_each_thread(t) > > +24 sum += READ_ONCE(per_thread(counter, t)); > > +25 WRITE_ONCE(global_count, sum); > > +26 poll(NULL, 0, 1); > > +27 if (READ_ONCE(stopflag)) { > > +28 smp_mb(); > > +29 WRITE_ONCE(stopflag, READ_ONCE(stopflag) + 1); > > +30 } > > +31 } > > +32 return NULL; > > +33 } > > +34 > > +35 void count_init(void) > > +36 { > > +37 thread_id_t tid; > > +38 > > +39 if (pthread_create(&tid, NULL, eventual, NULL) != 0) { > > +40 perror("count_init:pthread_create"); > > +41 exit(-1); > > +42 } > > +43 } > > +44 > > +45 void count_cleanup(void) > > +46 { > > +47 WRITE_ONCE(stopflag, 1); > > +48 smp_mb(); > > +49 while (READ_ONCE(stopflag) < 3) > > +50 poll(NULL, 0, 1); > > +51 smp_mb(); > > +52 } > > \end{verbbox} > > } > > \centering > > @@ -811,20 +813,20 @@ Lines~1-2 show the per-thread variable and the global variable that > > track the counter's value, and line three shows \co{stopflag} > > which is used to coordinate termination (for the case where we want > > to terminate the program with an accurate counter value). > > -The \co{inc_count()} function shown on lines~5-8 is similar to its > > +The \co{inc_count()} function shown on lines~5-9 is similar to its > > counterpart in > > Figure~\ref{fig:count:Array-Based Per-Thread Statistical Counters}. > > -The \co{read_count()} function shown on lines~10-13 simply returns the > > +The \co{read_count()} function shown on lines~11-14 simply returns the > > value of the \co{global_count} variable. > > > > -However, the \co{count_init()} function on lines~34-42 > > -creates the \co{eventual()} thread shown on lines~15-32, which > > +However, the \co{count_init()} function on lines~35-43 > > +creates the \co{eventual()} thread shown on lines~16-33, which > > cycles through all the threads, > > summing the per-thread local \co{counter} and storing the > > sum to the \co{global_count} variable. > > The \co{eventual()} thread waits an arbitrarily chosen one millisecond > > between passes. > > -The \co{count_cleanup()} function on lines~44-50 coordinates termination. > > +The \co{count_cleanup()} function on lines~45-52 coordinates termination. > > > > This approach gives extremely fast counter read-out while still > > supporting linear counter-update performance. > > > > > -- To unsubscribe from this list: send the line "unsubscribe perfbook" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html