Thanks paul, 2017-10-22 1:45 GMT+08:00 Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>: > On Sat, Oct 21, 2017 at 09:58:12PM +0800, Yubin Ruan wrote: >> Hi, >> >> In Listing 7.7, a hierarchy/conditional locking example is used to >> show how to reduce how contention: >> >> 1 void force_quiescent_state(struct rcu_node *rnp_leaf) >> 2 { >> 3 int ret; >> 4 struct rcu_node *rnp = rnp_leaf; >> 5 struct rcu_node *rnp_old = NULL; >> 6 >> 7 for (; rnp != NULL; rnp = rnp->parent) { >> 8 ret = (ACCESS_ONCE(gp_flags)) || >> 9 !raw_spin_trylock(&rnp->fqslock); >> 10 if (rnp_old != NULL) >> 11 raw_spin_unlock(&rnp_old->fqslock); >> 12 if (ret) >> 13 return; >> 14 rnp_old = rnp; >> 15 } >> 16 if (!ACCESS_ONCE(gp_flags)) { >> 17 ACCESS_ONCE(gp_flags) = 1; >> 18 do_force_quiescent_state(); >> 19 ACCESS_ONCE(gp_flags) = 0; >> 20 } >> 21 raw_spin_unlock(&rnp_old->fqslock); >> 22 } >> >> I understand the purpose and most of the implementation of the code. >> But one thing I don't really understand is that why we need line 16 >> here? By reaching line 16, we can be sure that that particular process >> have already acquired the fqslock of the root node and it should be >> the only one to reach there. So, it will always see gp_flags == 0 when >> reaching line 16. >> >> Did I miss anything? I read Quick Quiz 7.21 and it seems that there >> might be some tricky things there. > > Within the confines of this particular example, you miss nothing. > > I simplified the code from that in the Linux kernel, which you can see at > http://elixir.free-electrons.com/linux/latest/source/kernel/rcu/tree.c#L2942 I check the kernel source code and now I understand how this technique is implemented. One thing that you might find interesting is the code in `rcu_gp_kthread_wake()' only check against the flag using `!READ_ONCE(rsp->gp_flags)', but I think for code consistency it should be something like `(READ_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS)' ...? > In that code, force_quiescent_state() doesn't clear ->gp_flags itself, > it instead invokes rcu_gp_kthread_wake() to wake up another kernel > thread. This means that the second CPU can acquire the root-level > ->fqslock and see ->gp_flags equal to 1. > > So how should I fix this example? > > One approach would be to drop the check on line 16, as reflected in > your question. The downside of this approach is that two closely > spaced calls to force_quiescent_state() might needlessly both call > do_force_quiescent_state() -- the first call would service both requests, > so the second call would be needless overhead. But perhaps that is too > detailed a consideration. > > Another approach would be to move line 19 to follow line 21, possibly > with a timed wait in between. The idea here is that you never need to > invoke do_force_quiescent_state() more than (say) ten times a second, > so the "winner" waits a tenth of a second before clearing gp_flags. > > A third approach would be to add the wake-up code from the Linux kernel > back into the example. > > Right now, the timed-wait approach seems best, as it is simple, yet > teaches the point of this technique. But does anyone have a better idea? I think we should choose the timed-wait approach. The third approach is not an wise option as the book is concerned. The first approach might be a choice, but, after a few seconds of thinking, I think it is not "sophisticated" enough ;-) Thanks, Yubin -- To unsubscribe from this list: send the line "unsubscribe perfbook" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html