On Wed, Jan 17, 2024 at 07:46:28PM -0300, Leonardo Bras wrote: > On Wed, Jan 17, 2024 at 11:08:44PM +0100, Thomas Gleixner wrote: > > On Tue, Jan 16 2024 at 04:36, Leonardo Bras wrote: > > > This IRQ line disable bug can be easily reproduced with a serial8250 > > > console on a PREEMPT_RT kernel: it only takes the user to print a lot > > > of text to the console (or to ttyS0): around 300k chars should be > > > enough. > > > > That has nothing to do with RT, it's a problem of force threaded > > interrupts in combination with an edge type interrupt line and a > > hardware which keeps firing interrupts forever. > > Hello Thomas, thanks for your feedback! > > I agreed it has nothing to do with RT. > I just mentioned PREEMPT_RT as my test case scenario, since it enables > force-threaded IRQs. > > > > > > To fix this bug, reset irqs_unhandled whenever irq_thread handles at least > > > one IRQ request. > > > > This papers over the symptom and makes runaway detection way weaker for > > all interrupts or breaks it completely. > > This change is supposed to only touch threaded interruptions, since it will > reach the included line only if (action_ret == IRQ_WAKE_THREAD) and if > desc->threads_handled changes since the last IRQ request. > > This incrementing also happens only on irq_forced_thread_fn() and > irq_thread_fn(), which are called only from irq_thread_fn(). > > But I get the overall worry about having this making runaway detection way > weaker for all threaded interrupts. > > I have previously worked on a solution that can be more precise and be an > opt-in for drivers instead of a general solution: > > It required a change in IRQ interface that let the handlers inform how > many IRQs were actually handled (batching). This number would then be > added to desc->threads_handle (in irq_*thread_fn(), just changing the > atomic_inc() to atomic_add()), and then subtracted from irqs_unhandled > at note_interrupt(). > > In the serial8250 case, the driver would be changed to use that interface, > since it's already able to process multiple IRQs, and the bug just > vanishes. > > This also solved the serial driver issue, but required a deeper change in > the code, which caused me to consider a simpler solution first. > > This solution sure does give better runnaway detection. Do you think it > would be better that the one I sent in this patch? For reference, this is the alternative: https://gitlab.com/LeoBras/linux/-/commits/serial8250 Please let me know it you think this one is better. Thanks! Leo > > > > > The problem with edge type interrupts is that we cannot mask them like > > we do with level type interrupts in the hard interrupt handler and > > unmask them once the threaded handler finishes. > > > > So yes, we need special rules here when: > > > > 1) The interrupt handler is force threaded > > > > 2) The interrupt line is edge type > > > > 3) The accumulated unhandled interrupts are within a sane margin > > > > Thanks, > > > > tglx > > > > Completelly agree, that's why I am suggesting dealing with threaded > interruptions in a different way: reseting the unhandled count when it > handles a request. > > I am not sure how force threaded and just threaded are different in this > scenario. Could you help me understand? > > Thanks! > Leo