Multiple call to glink_subdev_stop() for the same remoteproc can happen if rproc_stop() fails from Process-A that leaves the rproc state to RPROC_CRASHED state later a call to recovery_store from user space in Process B triggers rproc_trigger_recovery() of the same remoteproc to recover it results in NULL pointer dereference issue in qcom_glink_smem_unregister(). There is other side to this issue if we want to fix this via adding a NULL check on glink->edge which does not guarantees that the remoteproc will recover in second call from Process B as it has failed in the first Process A during SMC shutdown call and may again fail at the same call and rproc can not recover for such case. Add a new rproc state RPROC_DEFUNCT i.e., non recoverable state of remoteproc and the only way to recover from it via system restart. Process-A Process-B fatal error interrupt happens rproc_crash_handler_work() mutex_lock_interruptible(&rproc->lock); ... rproc->state = RPROC_CRASHED; ... mutex_unlock(&rproc->lock); rproc_trigger_recovery() mutex_lock_interruptible(&rproc->lock); adsp_stop() qcom_q6v5_pas 20c00000.remoteproc: failed to shutdown: -22 remoteproc remoteproc3: can't stop rproc: -22 mutex_unlock(&rproc->lock); echo enabled > /sys/class/remoteproc/remoteprocX/recovery recovery_store() rproc_trigger_recovery() mutex_lock_interruptible(&rproc->lock); rproc_stop() glink_subdev_stop() qcom_glink_smem_unregister() ==| | V Unable to handle kernel NULL pointer dereference at virtual address 0000000000000358 Signed-off-by: Mukesh Ojha <quic_mojha@xxxxxxxxxxx> --- Changes in v3: - Fix kernel test reported error. Changes in v2: - Removed NULL pointer check instead added a new state to signify non-recoverable state of remoteproc. drivers/remoteproc/remoteproc_core.c | 3 ++- drivers/remoteproc/remoteproc_sysfs.c | 1 + include/linux/remoteproc.h | 5 ++++- 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c index f276956f2c5c..c4e14503b971 100644 --- a/drivers/remoteproc/remoteproc_core.c +++ b/drivers/remoteproc/remoteproc_core.c @@ -1727,6 +1727,7 @@ static int rproc_stop(struct rproc *rproc, bool crashed) /* power off the remote processor */ ret = rproc->ops->stop(rproc); if (ret) { + rproc->state = RPROC_DEFUNCT; dev_err(dev, "can't stop rproc: %d\n", ret); return ret; } @@ -1839,7 +1840,7 @@ int rproc_trigger_recovery(struct rproc *rproc) return ret; /* State could have changed before we got the mutex */ - if (rproc->state != RPROC_CRASHED) + if (rproc->state == RPROC_DEFUNCT || rproc->state != RPROC_CRASHED) goto unlock_mutex; dev_err(dev, "recovering %s\n", rproc->name); diff --git a/drivers/remoteproc/remoteproc_sysfs.c b/drivers/remoteproc/remoteproc_sysfs.c index 138e752c5e4e..5f722b4576b2 100644 --- a/drivers/remoteproc/remoteproc_sysfs.c +++ b/drivers/remoteproc/remoteproc_sysfs.c @@ -171,6 +171,7 @@ static const char * const rproc_state_string[] = { [RPROC_DELETED] = "deleted", [RPROC_ATTACHED] = "attached", [RPROC_DETACHED] = "detached", + [RPROC_DEFUNCT] = "defunct", [RPROC_LAST] = "invalid", }; diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h index b4795698d8c2..3e4ba06c6a9a 100644 --- a/include/linux/remoteproc.h +++ b/include/linux/remoteproc.h @@ -417,6 +417,8 @@ struct rproc_ops { * has attached to it * @RPROC_DETACHED: device has been booted by another entity and waiting * for the core to attach to it + * @RPROC_DEFUNCT: device neither crashed nor responding to any of the + * requests and can only recover on system restart. * @RPROC_LAST: just keep this one at the end * * Please note that the values of these states are used as indices @@ -433,7 +435,8 @@ enum rproc_state { RPROC_DELETED = 4, RPROC_ATTACHED = 5, RPROC_DETACHED = 6, - RPROC_LAST = 7, + RPROC_DEFUNCT = 7, + RPROC_LAST = 8, }; /** -- 2.34.1