Re: Probably redundant code at listing 7.7

Yubin Ruan <ablacktshirt@xxxxxxxxx> · Sun, 22 Oct 2017 14:41:39 +0800

Thanks paul,

2017-10-22 1:45 GMT+08:00 Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>:
> On Sat, Oct 21, 2017 at 09:58:12PM +0800, Yubin Ruan wrote:
>> Hi,
>>
>> In Listing 7.7, a hierarchy/conditional locking example is used to
>> show how to reduce how contention:
>>
>> 1 void force_quiescent_state(struct rcu_node *rnp_leaf)
>> 2 {
>> 3      int ret;
>> 4      struct rcu_node *rnp = rnp_leaf;
>> 5      struct rcu_node *rnp_old = NULL;
>> 6
>> 7      for (; rnp != NULL; rnp = rnp->parent) {
>> 8          ret = (ACCESS_ONCE(gp_flags)) ||
>> 9                  !raw_spin_trylock(&rnp->fqslock);
>> 10         if (rnp_old != NULL)
>> 11             raw_spin_unlock(&rnp_old->fqslock);
>> 12         if (ret)
>> 13             return;
>> 14         rnp_old = rnp;
>> 15     }
>> 16     if (!ACCESS_ONCE(gp_flags)) {
>> 17         ACCESS_ONCE(gp_flags) = 1;
>> 18         do_force_quiescent_state();
>> 19         ACCESS_ONCE(gp_flags) = 0;
>> 20     }
>> 21     raw_spin_unlock(&rnp_old->fqslock);
>> 22 }
>>
>> I understand the purpose and most of the implementation of the code.
>> But one thing I don't really understand is that why we need line 16
>> here? By reaching line 16, we can be sure that that particular process
>> have already acquired the fqslock of the root node and it should be
>> the only one to reach there. So, it will always see gp_flags == 0 when
>> reaching line 16.
>>
>> Did I miss anything? I read Quick Quiz 7.21 and it seems that there
>> might be some tricky things there.
>
> Within the confines of this particular example, you miss nothing.
>
> I simplified the code from that in the Linux kernel, which you can see at
> http://elixir.free-electrons.com/linux/latest/source/kernel/rcu/tree.c#L2942

I check the kernel source code and now I understand how this technique
is implemented. One thing that you might find interesting is the code
in `rcu_gp_kthread_wake()' only check against the flag using
`!READ_ONCE(rsp->gp_flags)', but I think for code consistency it
should be something like `(READ_ONCE(rsp->gp_flags) &
RCU_GP_FLAG_FQS)' ...?

> In that code, force_quiescent_state() doesn't clear ->gp_flags itself,
> it instead invokes rcu_gp_kthread_wake() to wake up another kernel
> thread.  This means that the second CPU can acquire the root-level
> ->fqslock and see ->gp_flags equal to 1.
>
> So how should I fix this example?
>
> One approach would be to drop the check on line 16, as reflected in
> your question.  The downside of this approach is that two closely
> spaced calls to force_quiescent_state() might needlessly both call
> do_force_quiescent_state() -- the first call would service both requests,
> so the second call would be needless overhead.  But perhaps that is too
> detailed a consideration.
>
> Another approach would be to move line 19 to follow line 21, possibly
> with a timed wait in between.  The idea here is that you never need to
> invoke do_force_quiescent_state() more than (say) ten times a second,
> so the "winner" waits a tenth of a second before clearing gp_flags.
>
> A third approach would be to add the wake-up code from the Linux kernel
> back into the example.
>
> Right now, the timed-wait approach seems best, as it is simple, yet
> teaches the point of this technique.  But does anyone have a better idea?

I think we should choose the timed-wait approach. The third approach
is not an wise option as the book is concerned. The first approach
might be a choice, but, after a few seconds of thinking, I think it is
not "sophisticated" enough ;-)

Thanks,
Yubin
--
To unsubscribe from this list: send the line "unsubscribe perfbook" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html