Re: [PATCH v4] remoteproc: core: do pm relax when in RPROC_OFFLINE

"Aiqun(Maria) Yu" <quic_aiquny@xxxxxxxxxxx> · Thu, 13 Oct 2022 09:40:09 +0800

Hi Mathieu,

On 10/13/2022 4:43 AM, Mathieu Poirier wrote:
Please add what has changed from one version to another, either in a cover
letter or after the "Signed-off-by".  There are many examples on how to do that
on the mailing list.

Thx for the information, will take a note and benefit for next time.

On Fri, Sep 16, 2022 at 03:12:31PM +0800, Maria Yu wrote:
RPROC_OFFLINE state indicate there is no recovery process
is in progress and no chance to do the pm_relax.
Because when recovering from crash, rproc->lock is held and
state is RPROC_CRASHED -> RPROC_OFFLINE -> RPROC_RUNNING,
and then unlock rproc->lock.

You are correct - because the lock is held rproc->state should be set to RPROC_RUNNING
when rproc_trigger_recovery() returns.  If that is not the case then something
went wrong.

Function rproc_stop() sets rproc->state to RPROC_OFFLINE just before returning,
so we know the remote processor was stopped.  Therefore if rproc->state is set
to RPROC_OFFLINE something went wrong in either request_firmware() or
rproc_start().  Either way the remote processor is offline and the system probably
in an unknown/unstable.  As such I don't see how calling pm_relax() can help
things along.

PROC_OFFLINE is possible that rproc_shutdown is triggered and 
successfully finished.
Even if it is multi crash rproc_crash_handler_work contention issue, and 
last rproc_trigger_recovery bailed out with only 
rproc->state==RPROC_OFFLINE, it is still worth to do pm_relax in pair.
Since the subsystem may still can be recovered with customer's next 
trigger of rproc_start, and we can make each error out path clean with 
pm resources.

I suggest spending time understanding what leads to the failure when recovering
from a crash and address that problem(s).

In current case, the customer's information is that the issue happened 
when rproc_shutdown is triggered at similar time. So not an issue from 
error out of rproc_trigger_recovery.
Thanks,
Mathieu


When the state is in RPROC_OFFLINE it means separate request
of rproc_stop was done and no need to hold the wakeup source
in crash handler to recover any more.

Signed-off-by: Maria Yu <quic_aiquny@xxxxxxxxxxx>
---
  drivers/remoteproc/remoteproc_core.c | 11 +++++++++++
  1 file changed, 11 insertions(+)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index e5279ed9a8d7..6bc7b8b7d01e 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1956,6 +1956,17 @@ static void rproc_crash_handler_work(struct work_struct *work)
  	if (rproc->state == RPROC_CRASHED || rproc->state == RPROC_OFFLINE) {
  		/* handle only the first crash detected */
  		mutex_unlock(&rproc->lock);
+		/*
+		 * RPROC_OFFLINE state indicate there is no recovery process
+		 * is in progress and no chance to have pm_relax in place.
+		 * Because when recovering from crash, rproc->lock is held and
+		 * state is RPROC_CRASHED -> RPROC_OFFLINE -> RPROC_RUNNING,
+		 * and then unlock rproc->lock.
+		 * RPROC_OFFLINE is only an intermediate state in recovery
+		 * process.
+		 */
+		if (rproc->state == RPROC_OFFLINE)
+			pm_relax(rproc->dev.parent);
  		return;
  	}
  
--
2.7.4



--
Thx and BRs,
Aiqun(Maria) Yu