Quoting Douglas Anderson (2020-04-14 10:23:26) > diff --git a/drivers/soc/qcom/rpmh-rsc.c b/drivers/soc/qcom/rpmh-rsc.c > index 732316bb67dc..4e45a8ac6cde 100644 > --- a/drivers/soc/qcom/rpmh-rsc.c > +++ b/drivers/soc/qcom/rpmh-rsc.c > @@ -791,36 +790,36 @@ static int rpmh_rsc_cpu_pm_callback(struct notifier_block *nfb, > { > struct rsc_drv *drv = container_of(nfb, struct rsc_drv, rsc_pm); > int ret = NOTIFY_OK; > - > - spin_lock(&drv->pm_lock); > + int cpus_in_pm; > > switch (action) { > case CPU_PM_ENTER: > - cpumask_set_cpu(smp_processor_id(), &drv->cpus_entered_pm); > - > - if (!cpumask_equal(&drv->cpus_entered_pm, cpu_online_mask)) > - goto exit; > + cpus_in_pm = atomic_inc_return(&drv->cpus_in_pm); > + if (cpus_in_pm < num_online_cpus()) Might be worth adding a comment here explaining that num_online_cpus() is stable because this is called from the cpu PM notifier path and a CPU can't go offline or come online without stopping the world. > + return NOTIFY_OK; > break; > case CPU_PM_ENTER_FAILED: > case CPU_PM_EXIT: > - cpumask_clear_cpu(smp_processor_id(), &drv->cpus_entered_pm); > - goto exit; > - } > - > - ret = rpmh_rsc_ctrlr_is_busy(drv); > - if (ret) { > - ret = NOTIFY_BAD; > - goto exit; > + atomic_dec(&drv->cpus_in_pm); We should also handle the cluster PM enums. I'm actually confused the compiler didn't complain about that already. Presumably we want to just ignore the cluster PM notifications because the counter handles it already. Looks like other code uses NOTIFY_DONE for the default case. > + return NOTIFY_OK; > } > > - ret = rpmh_flush(&drv->client); > - if (ret) > + /* > + * It's likely we're on the last CPU. Grab the drv->lock and write > + * out the sleep/wake commands to RPMH hardware. Grabbing the lock > + * means that if we race with another CPU coming up we are still > + * guaranteed to be safe. If another CPU came up just after we checked > + * and has already started an active transfer then we'll notice we're > + * busy and abort. If another CPU comes up after we start flushing it > + * will be blocked from starting an active transfer until we're done > + * flushing. If another CPU starts an active transfer after we release > + * the lock we're still OK because we're no longer the last CPU. > + */ > + spin_lock(&drv->lock); This should probably be a raw spinlock given that this is called from the idle path and sleeping there is not very nice for RT. That can come later of course. > + if (rpmh_rsc_ctrlr_is_busy(drv) || !rpmh_flush(&drv->client)) It almost feels like rpmh_rsc_ctrlr_is_busy() shold be rolled straight into rpmh_flush() so that rpmh_flush() fails if there are active requests in flight. > ret = NOTIFY_BAD; > - else > - ret = NOTIFY_OK; > + spin_unlock(&drv->lock); I'm looking at the latest linux-next code that I think has all the patches on the list for rpmh (latest commit is 1d3c6f86fd3f ("soc: qcom: rpmh: Allow RPMH driver to be loaded as a module")). I see that tcs->lock is taken first, and then drv->lock is taken next in tcs_write(). But then this takes drv->lock first and then calls rpmh_flush() (which goes to a different file.. yay!) and that calls flush_batch() which calls rpmh_rsc_write_ctrl_data() (back to this file... yay again!) which then locks tcs->lock. Isn't that an ABBA deadlock? > > -exit: > - spin_unlock(&drv->pm_lock); > return ret;