Re: [PATCH 1/2] count_stat_eventual: Switch from ACCESS_ONCE() to READ_ONCE()/WRITE_ONCE()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 15, 2017 at 11:35:51PM +0900, Akira Yokosawa wrote:
> On 2017/05/14 17:31:04 -0700, Paul E. McKenney wrote:
> > On Mon, May 15, 2017 at 12:15:34AM +0900, Akira Yokosawa wrote:
> >> On 2017/05/13 21:20:23 -0700, Paul E. McKenney wrote:
> >>> On Sun, May 14, 2017 at 10:31:51AM +0900, Akira Yokosawa wrote:
> >>>> On 2017/05/14 9:58, Akira Yokosawa wrote:
> >>>>> On 2017/05/14 7:56, Paul E. McKenney wrote:
> >>>>>> On Sat, May 13, 2017 at 11:37:06PM +0900, Akira Yokosawa wrote:
> >>>>>>> On 2017/05/13 05:45:56 -0700, Paul E. McKenney wrote:
> >>>>>>>> On Sat, May 13, 2017 at 09:04:38PM +0900, Akira Yokosawa wrote:
> >>>>>>>>> Hi Jason & Paul,
> >>>>>>>>>
> >>>>>>>>> although this has already been applied, I have a comment.
> >>>>>>>>>
> >>>>>>>>> On 2017/05/11 23:03:41 +0800, Junchang Wang wrote:
> >>>>>>>>>> Signed-off-by: Junchang Wang <junchangwang@xxxxxxxxx>
> >>>>>>>>>> ---
> >>>>>>>>>>  CodeSamples/count/count_stat_eventual.c | 8 ++++----
> >>>>>>>>>>  1 file changed, 4 insertions(+), 4 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/CodeSamples/count/count_stat_eventual.c b/CodeSamples/count/count_stat_eventual.c
> >>>>>>>>>> index 059ab8b..cbde4aa 100644
> >>>>>>>>>> --- a/CodeSamples/count/count_stat_eventual.c
> >>>>>>>>>> +++ b/CodeSamples/count/count_stat_eventual.c
> >>>>>>>>>> @@ -27,12 +27,12 @@ int stopflag;
> >>>>>>>>>>  
> >>>>>>>>>>  void inc_count(void)
> >>>>>>>>>>  {
> >>>>>>>>>> -	ACCESS_ONCE(__get_thread_var(counter))++;
> >>>>>>>>>> +	READ_ONCE(__get_thread_var(counter))++;
> >>>>>>>>>
> >>>>>>>>> This is OK because READ_ONCE() is defined as the same as ACCESS_ONCE()
> >>>>>>>>> in CodeSamples. However, the definition in the current Linux kernel
> >>>>>>>>> would not permit this.
> >>>>>>>>>
> >>>>>>>>> A read-modify-write access would need both READ_ONCE() and WRITE_ONCE().
> >>>>>>>>> However, since "counter" is thread local and updated only by its owner,
> >>>>>>>>> we don't need READ_ONCE() here. So:
> >>>>>>>>>
> >>>>>>>>> +	WRITE_ONCE(__get_thread_var(counter), __get_thread_var(counter) + 1);
> >>>>>>>>>
> >>>>>>>>> should have been sufficient.
> >>
> >> So, I measured the performance after this change.
> >> Unfortunately, this change doubles the overhead of updater.
> >>
> >> By inspecting the generated code, it looks like GCC (v5.4 for x86_64 with -O3 option)
> >> can't figure out two "__get_thread_var(counter)"s are identical, and uses two
> >> registers as pointers to access it. __get_thread_var() seems complex enough
> >> to obfuscate the compiler.
> >>
> >> To avoid this performance regression, we can use a temporary pointer as a hint for
> >> optimization:
> >>
> >> 	void inc_count(void)
> >> 	{
> >> 		unsigned long *p_cnt = &__get_thread_var(counter);
> >>
> >> 		WRITE_ONCE(*p_cnt, *p_cnt + 1);
> >> 	}
> >>
> >> Another idea is to restore ACCESS_ONCE().
> >> As long as the argument is properly aligned scalar type, ACCESS_ONCE() should be OK.
> >>
> >> 	ACCESS_ONCE(__get_thread_var(counter))++;
> >>
> >> will emit both load and store instructions, but in this short function,
> >> the compiler has no way to optimize away either access even if we don't use
> >> ACCESS_ONCE().
> >>
> >> Thoughts?
> > 
> > I am tempted to deprecate ACCESS_ONCE(), but yes, it is hard for the
> > compiler to optimize.  Unless, that is, you allow the compiler to
> > inline the function.  Which is the case by default.
> > 
> > Getting back to READ_ONCE(), how does the following patch look?
> 
> So, you've already pushed it.

Good point -- I should have held it back for review.  One of those days.

Still, changes can be made as needed.

> As for the definition of READ_ONCE(), it looks good.
> 
> You might also want to update the definition of READ_ONCE() at the end of
> Section 4.2.5.
> 
> As for the code snippet, I'm anticipating your take on the discussion
> on Patch 2/2 in another thread.

Will do!

							Thanx, Paul

> Optimization of inc_count() can wait for the time being.
> 
>                  Thanks, Akira
> 
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit 85747990ec809f57d61d30cd27bdc18b58163681
> > Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> > Date:   Sun May 14 17:26:41 2017 -0700
> > 
> >     count: Don't in-place increment a READ_ONCE()
> >     
> >     This commit prohibits READ_ONCE(x)++ and fixes the occurrence of this
> >     in count_state_eventual.c.
> >     
> >     Reported-by: Akira Yokosawa <akiyks@xxxxxxxxx>
> >     Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> > 
> > diff --git a/CodeSamples/api-pthreads/api-pthreads.h b/CodeSamples/api-pthreads/api-pthreads.h
> > index b12c3e14911c..c6031f7d59b7 100644
> > --- a/CodeSamples/api-pthreads/api-pthreads.h
> > +++ b/CodeSamples/api-pthreads/api-pthreads.h
> > @@ -127,7 +127,7 @@ static __inline__ int spin_is_locked(spinlock_t *sp)
> >  #define spin_unlock_irqrestore(l, f) do { f = 0; spin_unlock(l); } while (0)
> >  
> >  #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> > -#define READ_ONCE(x) ACCESS_ONCE(x)
> > +#define READ_ONCE(x) ({ typeof(x) ___x = ACCESS_ONCE(x); ___x; })
> >  #define WRITE_ONCE(x, val) ({ ACCESS_ONCE(x) = (val); })
> >  #ifndef unlikely
> >  #define unlikely(x) x
> > diff --git a/CodeSamples/api.h b/CodeSamples/api.h
> > index a18c70f2346c..3b4bc1629c7a 100644
> > --- a/CodeSamples/api.h
> > +++ b/CodeSamples/api.h
> > @@ -243,7 +243,7 @@ static __inline__ int spin_is_locked(spinlock_t *sp)
> >  #define spin_unlock_irqrestore(l, f) do { f = 0; spin_unlock(l); } while (0)
> >  
> >  #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> > -#define READ_ONCE(x) ACCESS_ONCE(x)
> > +#define READ_ONCE(x) ({ typeof(x) ___x = ACCESS_ONCE(x); ___x; })
> >  #define WRITE_ONCE(x, val) ({ ACCESS_ONCE(x) = (val); })
> >  #ifndef unlikely
> >  #define unlikely(x) x
> > diff --git a/CodeSamples/count/count_stat_eventual.c b/CodeSamples/count/count_stat_eventual.c
> > index 2caebfd1fb26..464de30d7aa8 100644
> > --- a/CodeSamples/count/count_stat_eventual.c
> > +++ b/CodeSamples/count/count_stat_eventual.c
> > @@ -27,7 +27,8 @@ int stopflag;
> >  
> >  void inc_count(void)
> >  {
> > -	READ_ONCE(__get_thread_var(counter))++;
> > +	WRITE_ONCE(__get_thread_var(counter),
> > +		   READ_ONCE(__get_thread_var(counter)) + 1);
> >  }
> >  
> >  unsigned long read_count(void)
> > diff --git a/count/count.tex b/count/count.tex
> > index 096b53d1d2bb..ae624c650c11 100644
> > --- a/count/count.tex
> > +++ b/count/count.tex
> > @@ -746,56 +746,58 @@ eventually consistent.
> >  \begin{figure}[tbp]
> >  { \scriptsize
> >  \begin{verbbox}
> > -  1 DEFINE_PER_THREAD(unsigned long, counter);
> > -  2 unsigned long global_count;
> > -  3 int stopflag;
> > -  4 
> > -  5 void inc_count(void)
> > -  6 {
> > -  7   ACCESS_ONCE(__get_thread_var(counter))++;
> > -  8 }
> > -  9 
> > - 10 unsigned long read_count(void)
> > - 11 {
> > - 12   return ACCESS_ONCE(global_count);
> > - 13 }
> > - 14 
> > - 15 void *eventual(void *arg)
> > - 16 {
> > - 17   int t;
> > - 18   int sum;
> > - 19 
> > - 20   while (stopflag < 3) {
> > - 21     sum = 0;
> > - 22     for_each_thread(t)
> > - 23       sum += ACCESS_ONCE(per_thread(counter, t));
> > - 24     ACCESS_ONCE(global_count) = sum;
> > - 25     poll(NULL, 0, 1);
> > - 26     if (stopflag) {
> > - 27       smp_mb();
> > - 28       stopflag++;
> > - 29     }
> > - 30   }
> > - 31   return NULL;
> > - 32 }
> > - 33 
> > - 34 void count_init(void)
> > - 35 {
> > - 36   thread_id_t tid;
> > - 37 
> > - 38   if (pthread_create(&tid, NULL, eventual, NULL)) {
> > - 39     perror("count_init:pthread_create");
> > - 40     exit(-1);
> > - 41   }
> > - 42 }
> > - 43 
> > - 44 void count_cleanup(void)
> > - 45 {
> > - 46   stopflag = 1;
> > - 47   while (stopflag < 3)
> > - 48     poll(NULL, 0, 1);
> > - 49   smp_mb();
> > - 50 }
> > + 1 DEFINE_PER_THREAD(unsigned long, counter);
> > + 2 unsigned long global_count;
> > + 3 int stopflag;
> > + 4
> > + 5 void inc_count(void)
> > + 6 {
> > + 7   WRITE_ONCE(__get_thread_var(counter),
> > + 8              READ_ONCE(__get_thread_var(counter)) + 1);
> > + 9 }
> > +10
> > +11 unsigned long read_count(void)
> > +12 {
> > +13   return READ_ONCE(global_count);
> > +14 }
> > +15
> > +16 void *eventual(void *arg)
> > +17 {
> > +18   int t;
> > +19   unsigned long sum;
> > +20
> > +21   while (READ_ONCE(stopflag) < 3) {
> > +22     sum = 0;
> > +23     for_each_thread(t)
> > +24       sum += READ_ONCE(per_thread(counter, t));
> > +25     WRITE_ONCE(global_count, sum);
> > +26     poll(NULL, 0, 1);
> > +27     if (READ_ONCE(stopflag)) {
> > +28       smp_mb();
> > +29       WRITE_ONCE(stopflag, READ_ONCE(stopflag) + 1);
> > +30     }
> > +31   }
> > +32   return NULL;
> > +33 }
> > +34
> > +35 void count_init(void)
> > +36 {
> > +37   thread_id_t tid;
> > +38
> > +39   if (pthread_create(&tid, NULL, eventual, NULL) != 0) {
> > +40     perror("count_init:pthread_create");
> > +41     exit(-1);
> > +42   }
> > +43 }
> > +44
> > +45 void count_cleanup(void)
> > +46 {
> > +47   WRITE_ONCE(stopflag, 1);
> > +48   smp_mb();
> > +49   while (READ_ONCE(stopflag) < 3)
> > +50     poll(NULL, 0, 1);
> > +51   smp_mb();
> > +52 }
> >  \end{verbbox}
> >  }
> >  \centering
> > @@ -811,20 +813,20 @@ Lines~1-2 show the per-thread variable and the global variable that
> >  track the counter's value, and line three shows \co{stopflag}
> >  which is used to coordinate termination (for the case where we want
> >  to terminate the program with an accurate counter value).
> > -The \co{inc_count()} function shown on lines~5-8 is similar to its
> > +The \co{inc_count()} function shown on lines~5-9 is similar to its
> >  counterpart in
> >  Figure~\ref{fig:count:Array-Based Per-Thread Statistical Counters}.
> > -The \co{read_count()} function shown on lines~10-13 simply returns the
> > +The \co{read_count()} function shown on lines~11-14 simply returns the
> >  value of the \co{global_count} variable.
> >  
> > -However, the \co{count_init()} function on lines~34-42
> > -creates the \co{eventual()} thread shown on lines~15-32, which
> > +However, the \co{count_init()} function on lines~35-43
> > +creates the \co{eventual()} thread shown on lines~16-33, which
> >  cycles through all the threads,
> >  summing the per-thread local \co{counter} and storing the
> >  sum to the \co{global_count} variable.
> >  The \co{eventual()} thread waits an arbitrarily chosen one millisecond
> >  between passes.
> > -The \co{count_cleanup()} function on lines~44-50 coordinates termination.
> > +The \co{count_cleanup()} function on lines~45-52 coordinates termination.
> >  
> >  This approach gives extremely fast counter read-out while still
> >  supporting linear counter-update performance.
> > 
> > 
> 

--
To unsubscribe from this list: send the line "unsubscribe perfbook" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux