On 8/7/21 3:56 AM, Hao Xu wrote: > 在 2021/8/6 下午10:27, Jens Axboe 写道: >> On Thu, Aug 5, 2021 at 4:05 AM Hao Xu <haoxu@xxxxxxxxxxxxxxxxx> wrote: >>> >>> There is an acct->nr_worker visit without lock protection. Think about >>> the case: two callers call io_wqe_wake_worker(), one is the original >>> context and the other one is an io-worker(by calling >>> io_wqe_enqueue(wqe, linked)), on two cpus paralelly, this may cause >>> nr_worker to be larger than max_worker. >>> Let's fix it by adding lock for it, and let's do nr_workers++ before >>> create_io_worker. There may be a edge cause that the first caller fails >>> to create an io-worker, but the second caller doesn't know it and then >>> quit creating io-worker as well: >>> >>> say nr_worker = max_worker - 1 >>> cpu 0 cpu 1 >>> io_wqe_wake_worker() io_wqe_wake_worker() >>> nr_worker < max_worker >>> nr_worker++ >>> create_io_worker() nr_worker == max_worker >>> failed return >>> return >>> >>> But the chance of this case is very slim. >>> >>> Fixes: 685fe7feedb9 ("io-wq: eliminate the need for a manager thread") >>> Signed-off-by: Hao Xu <haoxu@xxxxxxxxxxxxxxxxx> >>> --- >>> fs/io-wq.c | 17 ++++++++++++----- >>> 1 file changed, 12 insertions(+), 5 deletions(-) >>> >>> diff --git a/fs/io-wq.c b/fs/io-wq.c >>> index cd4fd4d6268f..88d0ba7be1fb 100644 >>> --- a/fs/io-wq.c >>> +++ b/fs/io-wq.c >>> @@ -247,9 +247,14 @@ static void io_wqe_wake_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) >>> ret = io_wqe_activate_free_worker(wqe); >>> rcu_read_unlock(); >>> >>> - if (!ret && acct->nr_workers < acct->max_workers) { >>> - atomic_inc(&acct->nr_running); >>> - atomic_inc(&wqe->wq->worker_refs); >>> + if (!ret) { >>> + raw_spin_lock_irq(&wqe->lock); >>> + if (acct->nr_workers < acct->max_workers) { >>> + atomic_inc(&acct->nr_running); >>> + atomic_inc(&wqe->wq->worker_refs); >>> + acct->nr_workers++; >>> + } >>> + raw_spin_unlock_irq(&wqe->lock); >>> create_io_worker(wqe->wq, wqe, acct->index); >>> } >>> } >> >> There's a pretty grave bug in this patch, in that you no call >> create_io_worker() unconditionally. This causes obvious problems with >> misaccounting, and stalls that hit the idle timeout... >> > This is surely a silly mistake, I'll check this patch and the 3/3 again. Please do - and please always run the full set of tests before sending out changes like this, you would have seen the slower runs and/or timeouts from the regression suite. I ended up wasting time on this thinking it was a change I made that broke it, before then debugging this one. -- Jens Axboe