On Thu, May 27, 2021 at 10:55:05PM -0500, Bjorn Andersson wrote: > On Wed 19 May 18:44 CDT 2021, Alex Elder wrote: > > > When a remoteproc has crashed, rproc_report_crash() is called to > > handle whatever recovery is desired. This can happen at almost any > > time, often triggered by an interrupt, though it can also be > > initiated by a write to debugfs file remoteproc/remoteproc*/crash. > > > > When a crash is reported, the crash handler worker is scheduled to > > run (rproc_crash_handler_work()). One thing that worker does is > > call rproc_trigger_recovery(), which calls rproc_stop(). That calls > > the ->stop method for any remoteproc subdevices before making the > > remote processor go offline. > > > > The Q6V5 modem remoteproc driver implements an SSR subdevice that > > notifies registered drivers when the modem changes operational state > > (prepare, started, stop/crash, unprepared). The IPA driver > > registers to receive these notifications. > > > > With that as context, I'll now describe the problem. > > > > There was a situation in which buggy modem firmware led to a modem > > crash very soon after system (AP) resume had begun. The crash caused > > a remoteproc SSR crash notification to be sent to the IPA driver. > > The problem was that, although system resume had begun, it had not > > yet completed, and the IPA driver was still in a suspended state. > > > > This scenario could happen to any driver that registers for these > > SSR notifications, because they are delivered without knowledge of > > the (suspend) state of registered recipient drivers. > > > > This patch offers a simple fix for this, by having the crash > > handling worker function run on the system freezable workqueue. > > This workqueue does not operate if user space is frozen (for > > suspend). As a result, the SSR subdevice only delivers its > > crash notification when the system is fully operational (i.e., > > neither suspended nor in suspend/resume transition). > > > > This makes sense to me; both that it ensures that we spend our resources > on the actual system resume and that it avoids surprises from this > happening while the system still is in a funky state... > > Reviewed-by: Bjorn Andersson <bjorn.andersson@xxxxxxxxxx> > > But it would be nice to get some input from other users of the > framework. > This patch is in my review queue - I should be able to get to it by the end of next week. > Regards, > Bjorn > > > Signed-off-by: Alex Elder <elder@xxxxxxxxxx> > > --- > > drivers/remoteproc/remoteproc_core.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c > > index 39cf44cb08035..6bedf2d2af239 100644 > > --- a/drivers/remoteproc/remoteproc_core.c > > +++ b/drivers/remoteproc/remoteproc_core.c > > @@ -2724,8 +2724,8 @@ void rproc_report_crash(struct rproc *rproc, enum rproc_crash_type type) > > dev_err(&rproc->dev, "crash detected in %s: type %s\n", > > rproc->name, rproc_crash_to_string(type)); > > > > - /* create a new task to handle the error */ > > - schedule_work(&rproc->crash_handler); > > + /* Have a worker handle the error; ensure system is not suspended */ > > + queue_work(system_freezable_wq, &rproc->crash_handler); > > } > > EXPORT_SYMBOL(rproc_report_crash); > > > > -- > > 2.27.0 > >