Hello Kuai, Thank you for your comments and questions. About our tests: - we used many (>= 100) logical volumes (each 100 GiB big) on top of 3 x md-raid5s (making up a raid50) on 8 SSDs each. - further we started a fio instance doing random rd/wr (also of random sizes) on each of those volumes, And, yes indeed, as you suggested, we observed some performance change already after adding (first) wq_has_sleeper(). The fio IOPS results improved by around 3-4% - but, in my experience, every fluctuation <= 3% can also have other reasons (e.g. different timings, temperature on SSDs, etc.). Only afterwards, I've also changed the wait queue construct with the swait variant (sufficient here), as its wake up path is simpler, faster and with a smaller stack footprint. Also used the (now since some years available) IDLE state for the waiting thread to eliminate the (not anymore necessary) checking and flushing of signals. This second round of changes, although theoretically an improvement, didn't bring any measurable performance increase, though. This was more or less expected. If you think, the simple adding of the wq_has_sleeper() is more justifiable, please apply only this change. Later today, I'll also send you a patch containing only this simple change. Best regards, Florian On Tue, Mar 26, 2024 at 3:09 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > Hi, > > 在 2024/03/26 13:22, Jinpu Wang 写道: > > Hi Song, hi Kuai, > > > > ping, Any comments? > > > > Thx! > > > > On Thu, Mar 7, 2024 at 1:08 PM Jack Wang <jinpu.wang@xxxxxxxxx> wrote: > >> > >> From: Florian-Ewald Mueller <florian-ewald.mueller@xxxxxxxxx> > >> > >> Replace md_thread's wait_event()/wake_up() related calls with their > >> simple swait~ variants to improve performance as well as memory and > >> stack footprint. > >> > >> Use the IDLE state for the worker thread put to sleep instead of > >> misusing the INTERRUPTIBLE state combined with flushing signals > >> just for not contributing to system's cpu load-average stats. > >> > >> Also check for sleeping thread before attempting its wake_up in > >> md_wakeup_thread() for avoiding unnecessary spinlock contention. > > I think it'll be better to split this into a seperate patch. > And can you check if we just add wq_has_sleeper(), will there be > performance improvement? > > >> > >> With this patch (backported) on a kernel 6.1, the IOPS improved > >> by around 4% with raid1 and/or raid5, in high IO load scenarios. > > Can you be more specifical about your test? because from what I know, > IO fast path doesn't involved with daemon thread, and I don't understand > yet where the 4% improvement is from. > > Thanks, > Kuai >