On Fri, 14 Aug 2020 07:24:52 +0200 Lars-Peter Clausen <lars@xxxxxxxxxx> wrote: > On 8/13/20 4:55 PM, Jonathan Cameron wrote: > > On Thu, 13 Aug 2020 14:19:57 +0200 > > Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > > >> Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> writes: > >>> On 2020-08-13 11:46:30 [+0200], Lars-Peter Clausen wrote: > >>>> If you are running with forced IRQ threads the only thing that will then > >>>> happen in the actual hard IRQ context is the launching of the IRQ threads. > >>>> Th e IRQ handler of the device driver will run in a threaded IRQ. > >>> So if it is really just the wakeup of the IRQ-thread then it should be > >>> okay. > >>> One thing: iio_trigger_poll() may invoke iio_trigger_notify_done(). This > >>> would invoke trig->ops->try_reenable callback if available. > >>> I grepped and found > >>> - bma180_trig_try_reen() > >>> It appears to perform i2c_smbus_read_byte_data() and smbus sounds > >>> sleeping. I don't know if it attempts to acquire any spinlock_t but it > >>> will be wrong on RT. > >> It's wrong even on !RT. i2c reads cannot be invoked from hard interrupt > >> context. > >> > > We would hit this (and resulting warnings) all the time if it actually > > happened, so my suspicion is that it doesn't. > > > > I think the path doesn't actually exist although it looks at first glance like it does. > > > > The interrupt can only be enabled if there is someone using the trigger. > > Thus usecount will be non zero and for at least one element > > trig->subirq[i].enabled == true > > > > So we will decrement trig->usecount in the call to iio_trigger_notify_done > > but never reach 0 thus the call to trig->ops->try_reenable never happens > > in the hard interrupt context. > > I think there is a race condition here. If a consumer is disabled > concurrently with iio_trigger_poll() there is a chance that `enabled` is > false for all consumers. > > The odds of this happening are very low, but there is nothing that > prevents it. A consumer disable should have previously resulted in the trigger stopping (or do we have he ordering wrong there?) Might be worth checking that, but see below, as I think we can stop it having harmful effects anyway. > > > > > It does happen later when which ever driver we triggered finishes the > > threaded part of it's handler and calls iio_trigger_notify_done, but that > > is fine. > > > > Assuming people agree with my analysis it would be good to make it explicit > > that we cannot hit the problem path. > > > > Perhaps call a new iio_trigger_notify_no_needed() that simply does > > the decrement without test, or does it with test and spits out a > > warning if we hit 0. > > I think we need to re-think the whole try_reenable() functionality. > Looking at it I think there are more issues here. > > For example lets say we call iio_trigger_notify_done() from the threaded > handler and try_reenable() returns true. We'd now call > iio_trigger_poll() from the threaded context, which is wrong. The history behind that functionality was some early devices which were using level interrupts tied to edge interrupts. One of our earliest drivers had to do a dance with checking a gpio to avoid stalling. That went away a long time ago when we basically concluded that platform was too broken to support interrupts on. Probably would have helped if try_reenable returned a bool as then people wouldn't have returned error codes via it which is completely wrong as it stands (though handy given what I suggest below) >From a quick look. bma180, bmc150, kxcjk1013, mxc4005m bmg180, kmx61, bmc150 all only fail if an error occurs in an bus access. That's a bug. at91 never returns anything other than 0. So I think we can probably rip the iio_trigger_poll call out entirely and replace it with an error message, or push an error message into the drivers and drop the check of the value of try_reenable. > > There is also the issue that iio_trigger_poll() effectively can end up > calling itself recursively indefinitely via the > iio_trigger_notify_done() chain, which might not be the best thing in > hard IRQ context. > :) Indeed unwise. I think the only driver that could in theory do this that we had a long time back had a counter in the try_reen to give up if it happened more than a few times. Jonathan