On Fri, Jan 13, 2023 at 1:11 PM Tejun Heo <tj@xxxxxxxxxx> wrote: > > On Fri, Jan 13, 2023 at 01:07:02PM -0800, Nathan Huckleberry wrote: > > Add a WQ flag that allows workqueues to use SCHED_FIFO with the least > > imporant RT priority. This can reduce scheduler latency for IO > > post-processing when the CPU is under load without impacting other RT > > workloads. This has been shown to improve app startup time on Android > > [1]. > > > > Scheduler latency affects several drivers as evidenced by [1], [2], [3], > > [4]. Some of these drivers have moved post-processing into IRQ context. > > However, this can cause latency spikes for real-time threads and jitter > > related jank on Android. Using a workqueue with SCHED_FIFO improves > > scheduler latency without causing latency problems for RT threads. > > > > [1]: > > https://lore.kernel.org/linux-erofs/20230106073502.4017276-1-dhavale@xxxxxxxxxx/ > > [2]: > > https://lore.kernel.org/linux-f2fs-devel/20220802192437.1895492-1-daeho43@xxxxxxxxx/ > > [3]: > > https://lore.kernel.org/dm-devel/20220722093823.4158756-4-nhuck@xxxxxxxxxx/ > > [4]: > > https://lore.kernel.org/dm-crypt/20200706173731.3734-1-ignat@xxxxxxxxxxxxxx/ > > > > This change has been tested on dm-verity with the following fio config: > > > > [global] > > time_based > > runtime=120 > > > > [do-verify] > > ioengine=sync > > filename=/dev/testing > > rw=randread > > direct=1 > > > > [burn_8x90%_qsort] > > ioengine=cpuio > > cpuload=90 > > numjobs=8 > > cpumode=qsort > > > > Before: > > clat (usec): min=13, max=23882, avg=29.56, stdev=113.29 READ: > > bw=122MiB/s (128MB/s), 122MiB/s-122MiB/s (128MB/s-128MB/s), io=14.3GiB > > (15.3GB), run=120001-120001msec > > > > After: > > clat (usec): min=13, max=23137, avg=19.96, stdev=105.71 READ: > > bw=180MiB/s (189MB/s), 180MiB/s-180MiB/s (189MB/s-189MB/s), io=21.1GiB > > (22.7GB), run=120012-120012msec > Hi Tejun, > Given that its use case mostly intersects with WQ_HIGHPRI, would it make > more sense to add a switch to alter its behavior instead? I don't really > like the idea of pushing the decision between WQ_HIGHPRI and WQ_SCHED_FIFO > to each user. Do you think something similar should be done for WQ_UNBOUND? In most places where WQ_HIGHPRI is used, WQ_UNBOUND is also used because it boosts performance. However, I suspect that most of these benchmarks were done on x86-64. I've found that WQ_UNBOUND significantly reduces performance on arm64/Android. >From the documentation, using WQ_UNBOUND for performance doesn't seem correct. It's only supposed to be used for long-running work. It might make more sense to get rid of WQ_UNBOUND altogether and only move work to unbound worker pools once it has stuck around for long enough. Android will probably need to remove WQ_UNBOUND from all of these performance critical users. If there are performance benefits to using unbinding workqueues from CPUs on x86-64, that should probably be a config flag, not controlled by every user. Thanks, Huck > > Thanks. > > -- > tejun