On Mon, May 02, 2022 at 09:08:31PM +0200, Uladzislau Rezki (Sony) wrote: > Motivation of backport: > ----------------------- > > 1. The cfcdef5e30469 ("rcu: Allow rcu_do_batch() to dynamically adjust batch sizes") > broke the default behaviour of "offloading rcu callbacks" setup. In that scenario > after each callback the caller context was used to check if it has to be rescheduled > giving a CPU time for others. After that change an "offloaded" setup can switch to > time-based RCU callbacks processing, what can be long for latency sensitive workloads > and SCHED_FIFO processes, i.e. callbacks are invoked for a long time with keeping > preemption off and without checking cond_resched(). > > 2. Our devices which run Android and 5.10 kernel have some critical areas which > are sensitive to latency. It is a low latency audio, 8k video, UI stack and so on. > For example below is a trace that illustrates a delay of "irq/396-5-0072" RT task > to complete IRQ processing: > > <snip> > rcuop/6-54 [000] d.h2 183.752989: irq_handler_entry: irq=85 name=i2c_geni > rcuop/6-54 [000] d.h5 183.753007: sched_waking: comm=irq/396-5-0072 pid=12675 prio=49 target_cpu=000 > rcuop/6-54 [000] dNh6 183.753014: sched_wakeup: irq/396-5-0072:12675 [49] success=1 CPU:000 > rcuop/6-54 [000] dNh2 183.753015: irq_handler_exit: irq=85 ret=handled > rcuop/6-54 [000] .N.. 183.753018: rcu_invoke_callback: rcu_preempt rhp=0xffffff88ffd440b0 func=__d_free.cfi_jt > rcuop/6-54 [000] .N.. 183.753020: rcu_invoke_callback: rcu_preempt rhp=0xffffff892ffd8400 func=inode_free_by_rcu.cfi_jt > rcuop/6-54 [000] .N.. 183.753021: rcu_invoke_callback: rcu_preempt rhp=0xffffff89327cd708 func=i_callback.cfi_jt > ... > rcuop/6-54 [000] .N.. 183.755941: rcu_invoke_callback: rcu_preempt rhp=0xffffff8993c5a968 func=i_callback.cfi_jt > rcuop/6-54 [000] .N.. 183.755942: rcu_invoke_callback: rcu_preempt rhp=0xffffff8993c4bd20 func=__d_free.cfi_jt > rcuop/6-54 [000] dN.. 183.755944: rcu_batch_end: rcu_preempt CBs-invoked=2112 idle=>c<>c<>c<>c< > rcuop/6-54 [000] dN.. 183.755946: rcu_utilization: Start context switch > rcuop/6-54 [000] dN.. 183.755946: rcu_utilization: End context switch > rcuop/6-54 [000] d..2 183.755959: sched_switch: rcuop/6:54 [120] R ==> migration/0:16 [0] > ... > migratio-16 [000] d..2 183.756021: sched_switch: migration/0:16 [0] S ==> irq/396-5-0072:12675 [49] > <snip> > > The "irq/396-5-0072:12675" was delayed for ~3 milliseconds due to introduced side effect. > Please note, on our Android devices we get ~70 000 callbacks registered to be invoked by > the "rcuop/x" workers. This is during 1 seconds time interval and regular handset usage. > Latencies bigger that 3 milliseconds affect our high-resolution audio streaming over the > LDAC/Bluetooth stack. > > Two patches depend on each other. One meta-comment. We can't apply changes to older kernels and not newer ones, as you do not want to upgrade your kernel and suffer a regression. This patch series comes from 5.17, but you are backporting to only 5.10. What about 5.15? I can't consider this series unless we have a series also for 5.15 for that reason, we have to keep in sync otherwise things get unmaintainable. So, have a 5.15 backport as well? Also, you forgot to cc: the developers of the patches in this 0/X email, that just causes confusion for those that do not receive this message. thanks, greg k-h