Hi, On Wed, Jun 8, 2022 at 9:13 AM Rob Clark <robdclark@xxxxxxxxx> wrote: > > From: Rob Clark <robdclark@xxxxxxxxxxxx> > > I've seen a few crashes like: > > CPU: 0 PID: 216 Comm: A618-worker Tainted: G W 5.4.196 #7 > Hardware name: Google Wormdingler rev1+ INX panel board (DT) > pstate: 20c00009 (nzCv daif +PAN +UAO) > pc : msm_readl+0x14/0x34 > lr : a6xx_gpu_busy+0x40/0x80 > sp : ffffffc011b93ad0 > x29: ffffffc011b93ad0 x28: ffffffe77cba3000 > x27: 0000000000000001 x26: ffffffe77bb4c4ac > x25: ffffffa2f227dfa0 x24: ffffffa2f22aab28 > x23: 0000000000000000 x22: ffffffa2f22bf020 > x21: ffffffa2f22bf000 x20: ffffffc011b93b10 > x19: ffffffc011bd4110 x18: 000000000000000e > x17: 0000000000000004 x16: 000000000000000c > x15: 000001be3a969450 x14: 0000000000000400 > x13: 00000000000101d6 x12: 0000000034155555 > x11: 0000000000000001 x10: 0000000000000000 > x9 : 0000000100000000 x8 : ffffffc011bd4000 > x7 : 0000000000000000 x6 : 0000000000000007 > x5 : ffffffc01d8b38f0 x4 : 0000000000000000 > x3 : 00000000ffffffff x2 : 0000000000000002 > x1 : 0000000000000000 x0 : ffffffc011bd4110 > Call trace: > msm_readl+0x14/0x34 > a6xx_gpu_busy+0x40/0x80 > msm_devfreq_get_dev_status+0x70/0x1d0 > devfreq_simple_ondemand_func+0x34/0x100 > update_devfreq+0x50/0xe8 > qos_notifier_call+0x2c/0x64 > qos_max_notifier_call+0x1c/0x2c > notifier_call_chain+0x58/0x98 > __blocking_notifier_call_chain+0x74/0x84 > blocking_notifier_call_chain+0x38/0x48 > pm_qos_update_target+0xf8/0x19c > freq_qos_apply+0x54/0x6c > apply_constraint+0x60/0x104 > __dev_pm_qos_update_request+0xb4/0x184 > dev_pm_qos_update_request+0x38/0x58 > msm_devfreq_idle_work+0x34/0x40 > kthread_worker_fn+0x144/0x1c8 > kthread+0x140/0x284 > ret_from_fork+0x10/0x18 > Code: f9000bf3 910003fd aa0003f3 d503201f (b9400260) > ---[ end trace f6309767a42d0831 ]--- > > Which smells a lot like touching hw after power collapse. This seems > a bit like a race/timing issue elsewhere, as pm_runtime_get_if_in_use() > in a6xx_gpu_busy() should have kept us from touching hw if it wasn't > powered. I dunno if we want to change the commit message since I think my patch [1] addresses the above problem? [1] https://lore.kernel.org/r/20220609094716.v2.1.Ie846c5352bc307ee4248d7cab998ab3016b85d06@changeid > But, we've seen cases where the idle_work scheduled by > msm_devfreq_idle() ends up racing with the resume path. Which, again, > shouldn't be a problem other than unnecessary freq changes. > > v2. Only move the runpm _put_autosuspend, and not the _mark_last_busy() > > Fixes: 9bc95570175a ("drm/msm: Devfreq tuning") > Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx> > Link: https://lore.kernel.org/r/20210927152928.831245-1-robdclark@xxxxxxxxx > --- > drivers/gpu/drm/msm/msm_gpu.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) In any case, your patch fixes the potential WARN_ON and seems like the right thing to do, so: Reviewed-by: Douglas Anderson <dianders@xxxxxxxxxxxx>