Hi, Ming
I don't think this is a generic issue in percpu_ref, I sort out some
processes
using percpu_ref like "part->ref", "blkg->refcnt" and
"ctx->reqs/ctx->users",
they all use percpu_ref_exit after "release" done which will not cause
problem.
so I think it should not change it in api(percpu_ref_put_many), and user
should
to guarantee it.
thanks!
Wensheng
在 2022/7/29 21:58, Ming Lei 写道:
On Fri, Jul 29, 2022 at 06:50:36PM +0800, Zhang Wensheng wrote:
From: Zhang Wensheng <zhangwensheng5@xxxxxxxxxx>
A problem was find in stable 5.10 and the root cause of it like below.
In the use of q_usage_counter of request_queue, blk_cleanup_queue using
"wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter))"
to wait q_usage_counter becoming zero. however, if the q_usage_counter
becoming zero quickly, and percpu_ref_exit will execute and ref->data
will be freed, maybe another process will cause a null-defef problem
like below:
CPU0 CPU1
blk_cleanup_queue
blk_freeze_queue
blk_mq_freeze_queue_wait
scsi_end_request
percpu_ref_get
...
percpu_ref_put
atomic_long_sub_and_test
percpu_ref_exit
ref->data -> NULL
ref->data->release(ref) -> null-deref
Looks it is one generic issue in percpu_ref, I think the following patch
should address it.
diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index d73a1c08c3e3..07308bd36d83 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -331,8 +331,12 @@ static inline void percpu_ref_put_many(struct percpu_ref *ref, unsigned long nr)
if (__ref_is_percpu(ref, &percpu_count))
this_cpu_sub(*percpu_count, nr);
- else if (unlikely(atomic_long_sub_and_test(nr, &ref->data->count)))
- ref->data->release(ref);
+ else {
+ percpu_ref_func_t *release = ref->data->release;
+
+ if (unlikely(atomic_long_sub_and_test(nr, &ref->data->count)))
+ release(ref);
+ }
rcu_read_unlock();
}
Thanks,
Ming