[PATCH v3] remoteproc: Add a new remoteproc state RPROC_DEFUNCT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Multiple call to glink_subdev_stop() for the same remoteproc can happen
if rproc_stop() fails from Process-A that leaves the rproc state to
RPROC_CRASHED state later a call to recovery_store from user space in
Process B triggers rproc_trigger_recovery() of the same remoteproc to
recover it results in NULL pointer dereference issue in
qcom_glink_smem_unregister().

There is other side to this issue if we want to fix this via adding a
NULL check on glink->edge which does not guarantees that the remoteproc
will recover in second call from Process B as it has failed in the first
Process A during SMC shutdown call and may again fail at the same call
and rproc can not recover for such case.

Add a new rproc state RPROC_DEFUNCT i.e., non recoverable state of
remoteproc and the only way to recover from it via system restart.

	Process-A                			Process-B

  fatal error interrupt happens

  rproc_crash_handler_work()
    mutex_lock_interruptible(&rproc->lock);
    ...

       rproc->state = RPROC_CRASHED;
    ...
    mutex_unlock(&rproc->lock);

    rproc_trigger_recovery()
     mutex_lock_interruptible(&rproc->lock);

      adsp_stop()
      qcom_q6v5_pas 20c00000.remoteproc: failed to shutdown: -22
      remoteproc remoteproc3: can't stop rproc: -22
     mutex_unlock(&rproc->lock);

						echo enabled > /sys/class/remoteproc/remoteprocX/recovery
						recovery_store()
						 rproc_trigger_recovery()
						  mutex_lock_interruptible(&rproc->lock);
						   rproc_stop()
						    glink_subdev_stop()
						      qcom_glink_smem_unregister() ==|
                                                                                     |
                                                                                     V
						      Unable to handle kernel NULL pointer dereference
                                                                at virtual address 0000000000000358

Signed-off-by: Mukesh Ojha <quic_mojha@xxxxxxxxxxx>
---
Changes in v3:
 - Fix kernel test reported error.

Changes in v2:
 - Removed NULL pointer check instead added a new state to signify
   non-recoverable state of remoteproc.

 drivers/remoteproc/remoteproc_core.c  | 3 ++-
 drivers/remoteproc/remoteproc_sysfs.c | 1 +
 include/linux/remoteproc.h            | 5 ++++-
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index f276956f2c5c..c4e14503b971 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1727,6 +1727,7 @@ static int rproc_stop(struct rproc *rproc, bool crashed)
 	/* power off the remote processor */
 	ret = rproc->ops->stop(rproc);
 	if (ret) {
+		rproc->state = RPROC_DEFUNCT;
 		dev_err(dev, "can't stop rproc: %d\n", ret);
 		return ret;
 	}
@@ -1839,7 +1840,7 @@ int rproc_trigger_recovery(struct rproc *rproc)
 		return ret;
 
 	/* State could have changed before we got the mutex */
-	if (rproc->state != RPROC_CRASHED)
+	if (rproc->state == RPROC_DEFUNCT || rproc->state != RPROC_CRASHED)
 		goto unlock_mutex;
 
 	dev_err(dev, "recovering %s\n", rproc->name);
diff --git a/drivers/remoteproc/remoteproc_sysfs.c b/drivers/remoteproc/remoteproc_sysfs.c
index 138e752c5e4e..5f722b4576b2 100644
--- a/drivers/remoteproc/remoteproc_sysfs.c
+++ b/drivers/remoteproc/remoteproc_sysfs.c
@@ -171,6 +171,7 @@ static const char * const rproc_state_string[] = {
 	[RPROC_DELETED]		= "deleted",
 	[RPROC_ATTACHED]	= "attached",
 	[RPROC_DETACHED]	= "detached",
+	[RPROC_DEFUNCT]		= "defunct",
 	[RPROC_LAST]		= "invalid",
 };
 
diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
index b4795698d8c2..3e4ba06c6a9a 100644
--- a/include/linux/remoteproc.h
+++ b/include/linux/remoteproc.h
@@ -417,6 +417,8 @@ struct rproc_ops {
  *			has attached to it
  * @RPROC_DETACHED:	device has been booted by another entity and waiting
  *			for the core to attach to it
+ * @RPROC_DEFUNCT:	device neither crashed nor responding to any of the
+ * 			requests and can only recover on system restart.
  * @RPROC_LAST:		just keep this one at the end
  *
  * Please note that the values of these states are used as indices
@@ -433,7 +435,8 @@ enum rproc_state {
 	RPROC_DELETED	= 4,
 	RPROC_ATTACHED	= 5,
 	RPROC_DETACHED	= 6,
-	RPROC_LAST	= 7,
+	RPROC_DEFUNCT	= 7,
+	RPROC_LAST	= 8,
 };
 
 /**
-- 
2.34.1





[Index of Archives]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Photo Sharing]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux