On Wed, Apr 29, 2020 at 09:44:02AM +0200, Arnaud POULIQUEN wrote: > Hi Mathieu, > > On 4/24/20 10:01 PM, Mathieu Poirier wrote: > > Refactor function rproc_trigger_recovery() in order to avoid > > reloading the firmware image when synchronising with a remote > > processor rather than booting it. Also part of the process, > > properly set the synchronisation flag in order to properly > > recover the system. > > > > Signed-off-by: Mathieu Poirier <mathieu.poirier@xxxxxxxxxx> > > --- > > drivers/remoteproc/remoteproc_core.c | 23 ++++++++++++++------ > > drivers/remoteproc/remoteproc_internal.h | 27 ++++++++++++++++++++++++ > > 2 files changed, 43 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c > > index ef88d3e84bfb..3a84a38ba37b 100644 > > --- a/drivers/remoteproc/remoteproc_core.c > > +++ b/drivers/remoteproc/remoteproc_core.c > > @@ -1697,7 +1697,7 @@ static void rproc_coredump(struct rproc *rproc) > > */ > > int rproc_trigger_recovery(struct rproc *rproc) > > { > > - const struct firmware *firmware_p; > > + const struct firmware *firmware_p = NULL; > > struct device *dev = &rproc->dev; > > int ret; > > > > @@ -1718,14 +1718,16 @@ int rproc_trigger_recovery(struct rproc *rproc) > > /* generate coredump */ > > rproc_coredump(rproc); > > > > - /* load firmware */ > > - ret = request_firmware(&firmware_p, rproc->firmware, dev); > > - if (ret < 0) { > > - dev_err(dev, "request_firmware failed: %d\n", ret); > > - goto unlock_mutex; > > + /* load firmware if need be */ > > + if (!rproc_needs_syncing(rproc)) { > > + ret = request_firmware(&firmware_p, rproc->firmware, dev); > > + if (ret < 0) { > > + dev_err(dev, "request_firmware failed: %d\n", ret); > > + goto unlock_mutex; > > + } > > If we started in syncing mode then rpoc->firmware is null > rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_CRASHED) can make rproc_needs_syncing(rproc) > false. You are correct, I will add an additional check in rproc_set_machine() to prevent a situation where rproc_alloc() has been called without an ops and any of the synchronisation flags are set to false. It is also possible that someone would call proc_alloc() without an ops and doesn't call rproc_set_state_machine(), in which case both ops and sync_ops would be NULL. Adding a check in rproc_add() is probably the best location to catch such a condition. > In this case here we fail the recovery an leave in RPROC_STOP state. > As you proposed in Loic RFC[1], what about adding a more explicit message to inform that the recovery > failed. Right, that's a different problem. > > [1]https://lkml.org/lkml/2020/3/11/402 > > Regards, > Arnaud > > } > > > > - /* boot the remote processor up again */ > > + /* boot up or synchronise with the remote processor again */ > > ret = rproc_start(rproc, firmware_p); > > > > release_firmware(firmware_p); > > @@ -1761,6 +1763,13 @@ static void rproc_crash_handler_work(struct work_struct *work) > > dev_err(dev, "handling crash #%u in %s\n", ++rproc->crash_cnt, > > rproc->name); > > > > + /* > > + * The remote processor has crashed - tell the core what operation > > + * to use from hereon, i.e whether an external entity will reboot > > + * the MCU or it is now the remoteproc core's responsability. > > + */ > > + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_CRASHED); > > + > > mutex_unlock(&rproc->lock); > > > > if (!rproc->recovery_disabled) > > diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h > > index 3985c084b184..61500981155c 100644 > > --- a/drivers/remoteproc/remoteproc_internal.h > > +++ b/drivers/remoteproc/remoteproc_internal.h > > @@ -24,6 +24,33 @@ struct rproc_debug_trace { > > struct rproc_mem_entry trace_mem; > > }; > > > > +/* > > + * enum rproc_sync_states - remote processsor sync states > > + * > > + * @RPROC_SYNC_STATE_CRASHED state to use after the remote processor > > + * has crashed but has not been recovered by > > + * the remoteproc core yet. > > + * > > + * Keeping these separate from the enum rproc_state in order to avoid > > + * introducing coupling between the state of the MCU and the synchronisation > > + * operation to use. > > + */ > > +enum rproc_sync_states { > > + RPROC_SYNC_STATE_CRASHED, > > +}; > > + > > +static inline void rproc_set_sync_flag(struct rproc *rproc, > > + enum rproc_sync_states state) > > +{ > > + switch (state) { > > + case RPROC_SYNC_STATE_CRASHED: > > + rproc->sync_with_rproc = rproc->sync_flags.after_crash; > > + break; > > + default: > > + break; > > + } > > +} > > + > > /* from remoteproc_core.c */ > > void rproc_release(struct kref *kref); > > irqreturn_t rproc_vq_interrupt(struct rproc *rproc, int vq_id); > >