During a context switch the scheduler invokes wq_worker_sleeping() with disabled preemption. Disabling preemption is needed because it protects access to `worker->sleeping'. As an optimisation it avoids invoking schedule() within the schedule path as part of possible wake up (thus preempt_enable_no_resched() afterwards). The io-wq has been added to the mix in the same section with disabled preemption. This breaks on PREEMPT_RT because io_wq_worker_sleeping() acquires a spinlock_t. Also within the schedule() the spinlock_t must be acquired after tsk_is_pi_blocked() otherwise it will block on the sleeping lock again while scheduling out. While playing with `io_uring-bench' I didn't notice a significant latency spike after converting io_wqe::lock to a raw_spinlock_t. The latency was more or less the same. I don't see a significant reason why this lock should become a raw_spinlock_t therefore I suggest to move it after the tsk_is_pi_blocked() check. The io_worker::flags are usually modified under the lock except in the scheduler path. Ideally the lock is always acquired since the IO_WORKER_F_UP flag is set early in the startup and IO_WORKER_F_RUNNING should be set unless the task loops within schedule(). I *think* ::flags requires the same protection like workqueue's ::sleeping and therefore I move the check within the locked section. Any feedback on this vs raw_spinlock_t? Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> --- fs/io-wq.c | 8 ++++---- kernel/sched/core.c | 10 +++++----- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/fs/io-wq.c b/fs/io-wq.c index e92c4724480ca..a7e07b3ac5b95 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -623,15 +623,15 @@ void io_wq_worker_sleeping(struct task_struct *tsk) struct io_worker *worker = kthread_data(tsk); struct io_wqe *wqe = worker->wqe; + spin_lock_irq(&wqe->lock); if (!(worker->flags & IO_WORKER_F_UP)) - return; + goto out; if (!(worker->flags & IO_WORKER_F_RUNNING)) - return; + goto out; worker->flags &= ~IO_WORKER_F_RUNNING; - - spin_lock_irq(&wqe->lock); io_wqe_dec_running(wqe, worker); +out: spin_unlock_irq(&wqe->lock); } diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 3bbb60b97c73c..b76c0f27bd95e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4694,18 +4694,18 @@ static inline void sched_submit_work(struct task_struct *tsk) * in the possible wakeup of a kworker and because wq_worker_sleeping() * requires it. */ - if (tsk->flags & (PF_WQ_WORKER | PF_IO_WORKER)) { + if (tsk->flags & PF_WQ_WORKER) { preempt_disable(); - if (tsk->flags & PF_WQ_WORKER) - wq_worker_sleeping(tsk); - else - io_wq_worker_sleeping(tsk); + wq_worker_sleeping(tsk); preempt_enable_no_resched(); } if (tsk_is_pi_blocked(tsk)) return; + if (tsk->flags & PF_IO_WORKER) + io_wq_worker_sleeping(tsk); + /* * If we are going to sleep and we have plugged IO queued, * make sure to submit it to avoid deadlocks. -- 2.28.0