Hello, On Thu, Apr 07, 2022 at 06:33:35PM +0800, Qi Zheng wrote: > In the percpu_ref_call_confirm_rcu(), we call the wake_up_all() > before calling percpu_ref_put(), which will cause the value of > percpu_ref to be unstable when percpu_ref_switch_to_atomic_sync() > returns. > > CPU0 CPU1 > > percpu_ref_switch_to_atomic_sync(&ref) > --> percpu_ref_switch_to_atomic(&ref) > --> percpu_ref_get(ref); /* put after confirmation */ > call_rcu(&ref->data->rcu, percpu_ref_switch_to_atomic_rcu); > > percpu_ref_switch_to_atomic_rcu > --> percpu_ref_call_confirm_rcu > --> data->confirm_switch = NULL; > wake_up_all(&percpu_ref_switch_waitq); > > /* here waiting to wake up */ > wait_event(percpu_ref_switch_waitq, !ref->data->confirm_switch); > (A)percpu_ref_put(ref); > /* The value of &ref is unstable! */ > percpu_ref_is_zero(&ref) > (B)percpu_ref_put(ref); > > As shown above, assuming that the counts on each cpu add up to 0 before > calling percpu_ref_switch_to_atomic_sync(), we expect that after switching > to atomic mode, percpu_ref_is_zero() can return true. But actually it will > return different values in the two cases of A and B, which is not what > we expected. > > Maybe the original purpose of percpu_ref_switch_to_atomic_sync() is > just to ensure that the conversion to atomic mode is completed, but it > should not return with an extra reference count. > > Calling wake_up_all() after percpu_ref_put() ensures that the value of > percpu_ref is stable after percpu_ref_switch_to_atomic_sync() returns. > So just do it. > > Signed-off-by: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx> > --- > lib/percpu-refcount.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c > index af9302141bcf..b11b4152c8cd 100644 > --- a/lib/percpu-refcount.c > +++ b/lib/percpu-refcount.c > @@ -154,13 +154,14 @@ static void percpu_ref_call_confirm_rcu(struct rcu_head *rcu) > > data->confirm_switch(ref); > data->confirm_switch = NULL; > - wake_up_all(&percpu_ref_switch_waitq); > > if (!data->allow_reinit) > __percpu_ref_exit(ref); > > /* drop ref from percpu_ref_switch_to_atomic() */ > percpu_ref_put(ref); > + > + wake_up_all(&percpu_ref_switch_waitq); The interface, at least originally, doesn't give any guarantee over whether there's gonna be a residual reference on it or not. There's nothing necessarily wrong with guaranteeing that but it's rather unusual and given that putting the base ref in a percpu_ref is a special "kill" operation and a ref in percpu mode always returns %false on is_zero(), I'm not quite sure how such semantics would be useful. Do you care to explain the use case with concrete examples? Also, the proposed patch is racy. There's nothing preventing percpu_ref_switch_to_atomic_sync() from waking up early between confirm_switch clearing and the wake_up_all, so the above change doesn't guarantee what it tries to guarantee. For that, you'd have to move confirm_switch clearing *after* percpu_ref_put() but then, you'd be accessing the ref after its final ref is put which can lead to use-after-free. In fact, the whole premise seems wrong. The switching needs a reference to the percpu_ref because it is accessing it asynchronously. The switching side doesn't know when the ref is gonna go away once it puts its reference and thus can't signal that they're done after putting their reference. We *can* make that work by putting the whole thing in its own critical section so that we can make confirm_switch clearing atomic with the possibly final put, but that's gonna add some complexity and begs the question why we'd need such a thing. Andrew, I don't think the patch as proposed makes much sense. Maybe it'd be better to keep it out of the tree for the time being? Thanks. -- tejun