On 2023-03-23 09:16:27 [+0100], Krzysztof Kozlowski wrote: > > Yeah closer :) The CPU-mask for workqueues can still be different on > > non-NOHZ-full CPUs. Still you interrupt the CPU doing in-userland work > > and this is not desired. > > Probably this should be done by workqueue core code. Individual drivers > should not need to investigate which CPUs are isolated. _Either_ this is part of the interrupt service routine or it is not. Sometimes work can be offloaded. However this interrupt is short and offloads work to a workqueue. Can the interrupt be moved to another CPU without breaking something? The use can only change the CPUs on which the workqueue can run but also the affinity of the IRQ itself. If the user wishes to isolate CPU X and move workqueues and interrupts away from the CPU the question is why is this a problem for you. > > You have a threaded-IRQ which does nothing but schedules a worker. Why? > > Why not sleep and remain in that threaded IRQ until the work is done? > > You _can_ sleep in the threaded IRQ if you have to. Force-threaded is > > different but this is one is explicit threaded so you could do it. > > If I get your point correctly, you want the IRQ handler thread to do the > actual work instead of scheduling work? The answer to this is probably here: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e0e27c3d4e20dab861566f1c348ae44e4b498630 Let me look. | Re-enabling an interrupt from its own interrupt handler may cause | an interrupt storm, if there is a pending interrupt and because its | handling is disabled due to already done entrance into the handler | above in the stack. I have hard time parsing this. disable_irq_nosync() and enable enable_irq() shouldn't be invoked from within the interrupt handler itself. This interrupt is already requested as a threaded handler with IRQF_ONESHOT. This means the IRQ-chip already disables the interrupt while the threaded handler is invoked. No need for that. I don't know what the purpose of reg_intr_clr here is. Acking the interrupt before reading the status and doing any work does not look right. | Also, apparently it is improper to lock a mutex in an interrupt contex. Again, this is an interrupt handler requested as a threaded handler. This handler is invoked with enabled interrupts and preemption. The code in this handler can invoke ssleep() or mutex_lock(). A might_sleep() does not produce a plat here. It okay to acquire a mutex. This is why we have threaded interrupts. You can't acquire a mutex in a forced-threaded handler. This is not the case here. > > > >>> However the thermal notifications have nothing to do with cpufreq. > >> > >> They have. The FW notifies that thermal mitigation is happening and > >> maximum allowed frequency is now XYZ. The cpufreq receives this and sets > >> maximum allowed scaling frequency for governor. > > > > I see. So the driver is doing something in worst case. This interrupt, > > you have per-CPU and you need to do this CPU? I mean could you change > > the affinity of the interrupt to another CPU? > > I don't know. The commit introducing it: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3ed6dfbd3bb987b3d2de86304ae45972ebff5870 > claimed it helps to reduce number of interrupts hitting CPU 10x-100x > times... I don't see it - neither in tests nor in the code, so I am just > thinking to revert that one. So it may run on another CPU but doing it on the right cluster reduces the received interrupt 10-100 times. Do we have per-cluster register or is the interrupt ACK wrong and this what is observed? The questions are: - What is the interrupt signaling. - What must be done to acknowledge the interrupt. This needs to be figured out and verified that it actually works as intended. The hardware might keep sending interrupt because the source is either not acknowledge properly or the source of the interrupt (the reason why it was generated in first place) is still pending/ not handled. The changes you reference look like "if we do this then it seems better.". No explanation about the issue/ root cause and the targeted solution. > Best regards, > Krzysztof Sebastian