On Thu, Aug 5, 2021 at 4:05 AM Hao Xu <haoxu@xxxxxxxxxxxxxxxxx> wrote: > > There is an acct->nr_worker visit without lock protection. Think about > the case: two callers call io_wqe_wake_worker(), one is the original > context and the other one is an io-worker(by calling > io_wqe_enqueue(wqe, linked)), on two cpus paralelly, this may cause > nr_worker to be larger than max_worker. > Let's fix it by adding lock for it, and let's do nr_workers++ before > create_io_worker. There may be a edge cause that the first caller fails > to create an io-worker, but the second caller doesn't know it and then > quit creating io-worker as well: > > say nr_worker = max_worker - 1 > cpu 0 cpu 1 > io_wqe_wake_worker() io_wqe_wake_worker() > nr_worker < max_worker > nr_worker++ > create_io_worker() nr_worker == max_worker > failed return > return > > But the chance of this case is very slim. > > Fixes: 685fe7feedb9 ("io-wq: eliminate the need for a manager thread") > Signed-off-by: Hao Xu <haoxu@xxxxxxxxxxxxxxxxx> > --- > fs/io-wq.c | 17 ++++++++++++----- > 1 file changed, 12 insertions(+), 5 deletions(-) > > diff --git a/fs/io-wq.c b/fs/io-wq.c > index cd4fd4d6268f..88d0ba7be1fb 100644 > --- a/fs/io-wq.c > +++ b/fs/io-wq.c > @@ -247,9 +247,14 @@ static void io_wqe_wake_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) > ret = io_wqe_activate_free_worker(wqe); > rcu_read_unlock(); > > - if (!ret && acct->nr_workers < acct->max_workers) { > - atomic_inc(&acct->nr_running); > - atomic_inc(&wqe->wq->worker_refs); > + if (!ret) { > + raw_spin_lock_irq(&wqe->lock); > + if (acct->nr_workers < acct->max_workers) { > + atomic_inc(&acct->nr_running); > + atomic_inc(&wqe->wq->worker_refs); > + acct->nr_workers++; > + } > + raw_spin_unlock_irq(&wqe->lock); > create_io_worker(wqe->wq, wqe, acct->index); > } > } There's a pretty grave bug in this patch, in that you no call create_io_worker() unconditionally. This causes obvious problems with misaccounting, and stalls that hit the idle timeout... -- Jens Axboe