Re: [PATCH] remoteproc: k3-r5: Decouple firmware booting from probe routine

"Kumar, Udit" <u-kumar1@xxxxxx> · Sat, 7 Sep 2024 15:41:38 +0530

On 9/6/2024 3:10 PM, Beleswar Padhi wrote:
The current implementation of the waiting mechanism in probe() waits for
the 'released_from_reset' flag to be set which is done in
k3_r5_rproc_prepare() as part of rproc_fw_boot(). This causes unexpected
failures in cases where the firmware is unavailable at boot time,
resulting in probe failure and removal of the remoteproc handles in the
sysfs paths.

I won't say failure, I will say this is behavior of driver.

Driver expect firmware to be available , but I agree driver should be 
able to execute

with/without firmware availability.


To address this, the waiting mechanism is refactored out of the probe
routine into the appropriate k3_r5_rproc_prepare/unprepare() and
k3_r5_rproc_start/stop() functions. This allows the probe routine to
complete without depending on firmware booting, while still maintaining
the required power-synchronization between cores.

Fixes: 61f6f68447ab ("remoteproc: k3-r5: Wait for core0 power-up before powering up core1")
Signed-off-by: Beleswar Padhi <b-padhi@xxxxxx>
---
Posted this as a Fix as this was breaking usecases where we wanted to load a
firmware by writing to sysfs handles in userspace.

  drivers/remoteproc/ti_k3_r5_remoteproc.c | 170 ++++++++++++++++-------
  1 file changed, 118 insertions(+), 52 deletions(-)

diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c
[..]
+	core0 = list_first_entry(&cluster->cores, struct k3_r5_core, elem);
+	core1 = list_last_entry(&cluster->cores, struct k3_r5_core, elem);
+	if (cluster->mode == CLUSTER_MODE_SPLIT && core == core1 &&
+	    core0->released_from_reset == false) {
+		ret = wait_event_interruptible_timeout(cluster->core_transition,
+						       core0->released_from_reset,
+						       msecs_to_jiffies(2000));
only one wait in start should be good enough,
+		if (ret <= 0) {
+			dev_err(dev, "can not power up core1 before core0");
+			return -EPERM;
+		}
+	}
+
  	ret = ti_sci_proc_get_status(core->tsp, &boot_vec, &cfg, &ctrl, &stat);
  	if (ret < 0)
  		return ret;
@@ -470,6 +492,12 @@ static int k3_r5_rproc_prepare(struct rproc *rproc)
  		return ret;
  	}
  
+	/* Notify all threads in the wait queue when core state has changed so
+	 * that threads waiting for this condition can be executed.
+	 */
+	core->released_from_reset = true;
+	wake_up_interruptible(&cluster->core_transition);
+
  	/*
  	 * Newer IP revisions like on J7200 SoCs support h/w auto-initialization
  	 * of TCMs, so there is no need to perform the s/w memzero. This bit is
@@ -515,14 +543,46 @@ static int k3_r5_rproc_unprepare(struct rproc *rproc)
  {
  	struct k3_r5_rproc *kproc = rproc->priv;
  	struct k3_r5_cluster *cluster = kproc->cluster;
-	struct k3_r5_core *core = kproc->core;
+	struct k3_r5_core *core0, *core1, *core = kproc->core;


why you need wait in unprepare/stop,

In case you are failing during firmware load or so then already we are 
in bad state.

if this is call from user land then, i don't except anyone will auto 
trigger stopping of core-0

IMO, in unprepare/stop, if action is attempted on core1 with core-0 ON, 
simply return error.


[..]
@@ -629,7 +702,7 @@ static int k3_r5_rproc_stop(struct rproc *rproc)
  	struct k3_r5_rproc *kproc = rproc->priv;
  	struct k3_r5_cluster *cluster = kproc->cluster;
  	struct device *dev = kproc->dev;