在 2021/9/12 上午6:13, Jens Axboe 写道:
On 9/11/21 1:40 PM, Hao Xu wrote:
We may enter the worker creation path from io_worker_exit(), and
refcount is already zero, which causes definite failure of worker
creation.
io_worker_exit
ref = 0
->io_wqe_dec_running
->io_queue_worker_create
->io_worker_get failure since ref is 0
Signed-off-by: Hao Xu <haoxu@xxxxxxxxxxxxxxxxx>
---
fs/io-wq.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/io-wq.c b/fs/io-wq.c
index 0e1288a549eb..75e79571bdfd 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -188,7 +188,9 @@ static void io_worker_exit(struct io_worker *worker)
list_del_rcu(&worker->all_list);
acct->nr_workers--;
preempt_disable();
+ refcount_set(&worker->ref, 1);
io_wqe_dec_running(worker);
+ refcount_set(&worker->ref, 0);
That kind of refcount manipulation is highly suspect. And in fact it
should not be needed, io_worker_exit() clears ->flags before going on
with worker teardown. Hence you can't hit worker creation from
io_wqe_dec_running().
Doesn't see the relationship between worker->flags and the creation.
But yes, the creation path does io_worker_get() which causes failure
if it's from io_worker_exit(), Now I understand it is more like a
feature, isn't it? Anyway, the issue in 4/4 seems still there.
Regards,
Hao