On 21/03/2023 14:39, Sebastian Andrzej Siewior wrote: > On 2023-03-21 12:27:42 [+0100], Krzysztof Kozlowski wrote: >>> I still fail to understand why this is PREEMPT_RT specific and not a >>> problem in general when it comes not NO_HZ_FULL and/ or CPU isolation. >> >> Hm, good point, I actually don't know what is the workqueue >> recommendation for NO_HZ_FULL CPUs - is still locality of the workqueue >> preferred? > > If you isolate a CPU you want the kernel to stay away from it. The idea > is that something is done on that CPU and the kernel should leave it > alone. That is why the HZ tick avoided. That is why timers migrate to > the "housekeeping" CPU and do not fire on the CPU that it was programmed > on (unless the timer has to fire on this CPU). > >> And how such code would look like? >> if (tick_nohz_tick_stopped())? > > Yeah closer :) The CPU-mask for workqueues can still be different on > non-NOHZ-full CPUs. Still you interrupt the CPU doing in-userland work > and this is not desired. Probably this should be done by workqueue core code. Individual drivers should not need to investigate which CPUs are isolated. > You have a threaded-IRQ which does nothing but schedules a worker. Why? > Why not sleep and remain in that threaded IRQ until the work is done? > You _can_ sleep in the threaded IRQ if you have to. Force-threaded is > different but this is one is explicit threaded so you could do it. If I get your point correctly, you want the IRQ handler thread to do the actual work instead of scheduling work? The answer to this is probably here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e0e27c3d4e20dab861566f1c348ae44e4b498630 > >>> However the thermal notifications have nothing to do with cpufreq. >> >> They have. The FW notifies that thermal mitigation is happening and >> maximum allowed frequency is now XYZ. The cpufreq receives this and sets >> maximum allowed scaling frequency for governor. > > I see. So the driver is doing something in worst case. This interrupt, > you have per-CPU and you need to do this CPU? I mean could you change > the affinity of the interrupt to another CPU? I don't know. The commit introducing it: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3ed6dfbd3bb987b3d2de86304ae45972ebff5870 claimed it helps to reduce number of interrupts hitting CPU 10x-100x times... I don't see it - neither in tests nor in the code, so I am just thinking to revert that one. Best regards, Krzysztof