On Tue, Oct 02, 2018 at 07:48:15AM +0900, Akira Yokosawa wrote: > On 2018/09/17 0:08, Paul E. McKenney wrote: > > On Sun, Sep 16, 2018 at 05:37:23PM +0900, Akira Yokosawa wrote: > >> Hi Paul, > >> > >> Every time I review code under CodeSamples/, > >> I find myself confused where to use READ_ONCE/WRITE_ONCEs. > >> > >> I'm looking at Listing 5.3 of current master. > >> > >> There are two cases which lack READ_ONCE/WRITE_ONCE to access potentially > >> shared variables, namely on line 5 (__get_thread_var(counter)++;) and > >> on line 14 (sum += per_thread(counter, t);). > >> > >> Line 5 looks like a good candidate to be optimized out when inlined. > >> But the performance result indicates "gcc -O3" keeps it inside the loop. > >> > >> Is this because the definition of __get_thread_var() contains > >> a call to smp_thread_id() and complicated enough not to be optimized > >> out? > >> > >> As for line 14, as per_thread() was derived from per_cpu() of kernel > >> API, I looked for call sites of per_cpu() in the kernel source tree. > >> > >> There are very few cases where READ_ONCE/WRITE_ONCE is used along > >> with per_cpu(). There are two READ_ONCEs with per_cpu() in > >> kernel/rcu/srcutree.c, whose author is none other than you. > >> Are those READ_ONCEs necessary? > >> > >> I don't grasp the actual definition of per_cpu() macro. > >> Definition of per_thread() macro under CodeSamples/api-pthreads/ > >> does not look so complicated, but contains array indexing, > >> which might be good enough to prevent optimization in the loop. > >> > >> I'm not sure, but my gut feeling is that READ_ONCE/WRITE_ONCE > >> is necessary to access an unannotated variable. If we need > >> volatility for sure, we could modify the definition of annotating > >> macros/functions. > >> > >> Can you enlighten me? > > > > Yes, I do need to expand on this, both in perfbook and in order to inspect > > my usage in Linux-kernel RCU. Please see below for my current outline, > > and any and all feedback would be deeply appreciated! > > > > So, I applied your outline to accesses in Listing 5.3. > See inline comments below for my observation. > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > > > Plain accesses, that is, without either READ_ONCE() or WRITE_ONCE(): > > o Only accessed under lock. > > o Only accessed by owning CPU/thread. > > o Only modified by owning CPU/thread under lock, and otherwise > > accessed only either while holding lock or by owning CPU/thread. > > > > WRITE_ONCE() for modifications, READ_ONCE() for non-owning CPUs/threads > > not holding lock: > > o Only modified while holding lock by owning CPU/thread, > > could be read from anywhere by anyone. > > > > WRITE_ONCE() for modifications, READ_ONCE() for CPUs/threads not holding > > lock: > > o Only modified while holding lock, could be read anywhere by > > anyone. > > > > WRITE_ONCE() for modifications, READ_ONCE() for non-owning > > CPUs/threads: > > o Only modified by owning CPU/threads, could be read anywhere > > by anyone. > > Listing 5.3's per-thread variable "counter" looks covered here. > So the write part of the increment on line 5 needs WRITE_ONCE(). > Read access on line 14 also needs READ_ONCE(). > > Read part of the increment doesn't need READ_ONCE(), I guess. > > Line 5 would look like: > > WRITE_ONCE(__get_thread_var(counter), __get_thread_var(counter) + 1); > > . Considering the cost of __get_thread_var(), there should be a room > to help compilers to optimize by using a local pointer as follows: > > static __inline__ void inc_count(void) > { > long *p_counter = &__get_thread_var(counter); > > WRITE_ONCE(*p_counter, *p_counter + 1); > } > > . Does this look reasonable to you? Or is it better to also use > READ_ONCE()? Yes, this is fine. You don't need READ_ONCE() because threads only modify the counters that they own, and this case is a thread reading its own counter -- thus concurrent update cannot happen. > But if __get_thread_var() does imply volatility, we can keep the > current way of increment: > > __get_thread_var(counter)++; Unfortunately, it does not, at least not the way that it is currently implemented. > Thoughts? Your taking the pointer and using WRITE_ONCE() looks good! Thanx, Paul > Thanks, Akira > > > > > Otherwise, READ_ONCE() and WRITE_ONCE() everywhere. > > > > But note that READ_ONCE() and WRITE_ONCE() provide absolutely no > > ordering guarantees unless: > > > > o There is only one variable being accesses, and all of those > > accesses are of the same size and alignment. > > > > o There is only one CPU/thread doing the accesses to all > > of the variables during the timeframe of interest. > > > > o There are other operations involved, for example, smp_mb(). > > >