Hi Péter, On 29.05.23 at 08:46, Péter Ujfalusi wrote: >> My hope was that it could also avoid some of (existing or future) DMI entries. But even if it does not >> (e.g. the problem Péter Ujfalusi reported with the UPX-i11 cannot be fixed by this patch set and thus >> needs the DMI quirk) we may at least avoid more bug reports due to interrupt storms once >> 6.4 is released. > > I'm surprised that there is a need for a storm detection in the first > place... Do we have something else on the same IRQ line on the affected > devices which might have a bug or no driver at all? > It is hard to believe that a TPM (Trusted Platform Module) is integrated > so poorly ;) > > But put that aside: I think the storm detection is good given that there > is no other way to know which machine have sloppy TPM integration. > There are machines where this happens, so it is a know integration > issue, right? >> My only 'nitpick' is with the printk level to be used. > The ERR level is not correct as we know the issue and we handle it, so > all is under control. > If we want to add these machines to the quirk list then WARN is a good > level to gain attention but I'm not sure if a user will know how to get > the machine in the quirk (where to file a bug). > If we only want the quirk to be used for machines like UPX-i11 which > simply just have broken (likely floating) IRQ line then the WARN is too > high level, INFO or even DBG would be appropriate as you are not going > to update the quirk, it is just handled under the hood (which is a great > thing, but on the other hand you will have the storm never the less and > that is not a nice thing). > > It is a matter on how this is going to be handled in a long term. Add > quirk for all the known machines with either stormy or plain broken IRQ > line or handle the stormy ones and quirk the broken ones only. > Even in the long run I would always prefer a generic solution whenever it is possible. Quirks are fine for issues that cant be solved in another way or really require individual treatment. While I agree that ERR is not a good choice for the "falling back to polling" message I do not have a strong opinion on whether WARN, NOTICE or INFO is more appropriate. Jarko seems to prefer WARN. >>>> Detect an interrupt storm by counting the number of unhandled interrupts >>>> within a 10 ms time interval. In case that more than 1000 were unhandled >>>> deactivate interrupts, deregister the handler and fall back to polling. >>> >>> I know it can be sometimes hard to evaluate but can you try to explain >>> how you came up to the 10 ms sampling period and 1000 interrupt >>> threshold? I just don't like abritrary numbers. >> >> At least the 100 ms is not plucked out of thin air but its the same time period >> that the generic code in note_interrupt() uses - I assume for a good reason. >> Not only this number but the whole irq storm detection logic is taken from >> there: >> >>> >>>> This equals the implementation that handles interrupt storms in >>>> note_interrupt() by means of timestamps and counters in struct irq_desc. >>> The number of 1000 unhandled interrupts is still far below the 99900 > used in >> note_interrupt() but IMHO enough to indicate that there is something seriously >> wrong with interrupt processing and it is probably saver to fall back to polling. > > Except that if the line got the spurious designation in core, the > interrupt line will be disabled while the TPM driver will think that it > is still using IRQ mode and will not switch to polling. In the case that an interrupt storm cant be detected (since there might not even be one) I am fine with adding a quirk. > > A storm of 1000 is better than a storm of 99900 for sure but quirking > these would be the desired final solution. imho > As I wrote above I prefer a generic solution whenever possible. > There are many buts around this ;) > Regards, Lino