On Tue, Jun 25, 2024 at 04:21:11PM GMT, Stefan Wahren wrote: > Am 24.06.24 um 14:57 schrieb Maxime Ripard: > > On Sun, Jun 16, 2024 at 11:27:08AM GMT, Stefan Wahren wrote: > > > Hi, > > > i'm currently experiment with suspend to idle on the Raspberry Pi [1]. > > > > > > During my tests, i noticed a WARNING of the vc4 during suspend incl. > > > Runtime PM usage count underflow. It would be nice if someone can look > > > at it. In case you want to reproduce it, i can prepare a branch with > > > some improvements/hacks. For example i disabled dwc2/USB because it > > > cause a lot of trouble during resume. > > > > > > Here are the steps to trigger this issue: > > > - make sure necessary kernel options are enabled ( CONFIG_SUSPEND, > > > CONFIG_PM_DEBUG, CONFIG_PM_ADVANCED_DEBUG ) > > > - make sure no HDMI cable is connected to Raspberry Pi 3 Plus > sorry, i forgot the specific type of the Raspberry Pi 3 B Plus, but the > issue on Pi 3 A Plus is the same. > > > - Add "no_console_suspend" to cmdline.txt and reboot > > > - Connect via Debug UART: > > > > > > echo 1 > /sys/power/pm_debug_messages > > > echo platform > /sys/power/pm_test > > > echo freeze > /sys/power/state > > > > > > Here is the output: > > > > > > [ 75.538497] PM: suspend entry (s2idle) > > > [ 76.915786] Filesystems sync: 1.377 seconds > > > [ 79.678262] rpi_firmware_set_power: Set HDMI to 1 > > > [ 79.678634] rpi_firmware_set_power: Set HDMI to 0 > > > [ 79.850949] Freezing user space processes > > > [ 79.852460] Freezing user space processes completed (elapsed 0.001 > > > seconds) > > > [ 79.852484] OOM killer disabled. > > > [ 79.852493] Freezing remaining freezable tasks > > > [ 79.853684] Freezing remaining freezable tasks completed (elapsed > > > 0.001 seconds) > > > [ 80.892819] ieee80211 phy0: brcmf_fil_cmd_data: bus is down. we have > > > nothing to do. > > > [ 80.892843] ieee80211 phy0: brcmf_cfg80211_get_tx_power: error (-5) > > > [ 81.514053] PM: suspend of devices complete after 1659.336 msecs > > > [ 81.514089] PM: start suspend of devices complete after 1660.386 msecs > > > [ 81.515616] PM: late suspend of devices complete after 1.509 msecs > > > [ 81.515938] rpi_firmware_set_power: Set VEC to 0 > > > [ 81.516010] rpi_firmware_set_power: Set V3D to 0 > > > [ 81.516998] PM: noirq suspend of devices complete after 1.239 msecs > > > [ 81.517016] PM: suspend debug: Waiting for 5 second(s). > > > [ 89.598310] rpi_firmware_set_power: Set V3D to 1 > > > [ 90.078228] ------------[ cut here ]------------ > > > [ 90.078240] WARNING: CPU: 1 PID: 216 at > > > drivers/gpu/drm/vc4/vc4_hdmi.c:477 > > > vc4_hdmi_connector_detect_ctx+0x2e4/0x34c [vc4] > > > [ 90.078344] Modules linked in: aes_arm aes_generic cbc crypto_simd > > > cryptd algif_skcipher af_alg brcmfmac_wcc brcmfmac vc4 brcmutil > > > snd_soc_hdmi_codec snd_soc_core ac97_bus sha256_generic > > > snd_pcm_dmaengine libsha256 snd_pcm sha256_arm snd_timer hci_uart > > > cfg80211 btbcm snd bluetooth soundcore drm_dma_helper crc32_arm_ce > > > ecdh_generic ecc raspberrypi_hwmon libaes bcm2835_thermal > > > [ 90.078551] CPU: 1 PID: 216 Comm: kworker/1:2 Not tainted 6.9.3-dirty #30 > > > [ 90.078568] Hardware name: BCM2835 > > > [ 90.078580] Workqueue: events output_poll_execute > > > [ 90.078610] Call trace: > > > [ 90.078624] unwind_backtrace from show_stack+0x10/0x14 > > > [ 90.078660] show_stack from dump_stack_lvl+0x50/0x64 > > > [ 90.078688] dump_stack_lvl from __warn+0x7c/0x124 > > > [ 90.078723] __warn from warn_slowpath_fmt+0x170/0x178 > > > [ 90.078760] warn_slowpath_fmt from > > > vc4_hdmi_connector_detect_ctx+0x2e4/0x34c [vc4] > > > [ 90.078862] vc4_hdmi_connector_detect_ctx [vc4] from > > > drm_helper_probe_detect_ctx+0x40/0x120 > > > [ 90.078951] drm_helper_probe_detect_ctx from > > > output_poll_execute+0x160/0x24c > > > [ 90.078982] output_poll_execute from process_one_work+0x16c/0x3b4 > > > [ 90.079012] process_one_work from worker_thread+0x270/0x488 > > > [ 90.079036] worker_thread from kthread+0xe0/0xfc > > > [ 90.079060] kthread from ret_from_fork+0x14/0x28 > > > [ 90.079080] Exception stack(0xf0af9fb0 to 0xf0af9ff8) > > > [ 90.079096] 9fa0: 00000000 > > > 00000000 00000000 00000000 > > > [ 90.079113] 9fc0: 00000000 00000000 00000000 00000000 00000000 > > > 00000000 00000000 00000000 > > > [ 90.079129] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > > > [ 90.079141] ---[ end trace 0000000000000000 ]--- > > > [ 90.079155] vc4_hdmi 3f902000.hdmi: vc4_hdmi_connector_detect_ctx: > > > pm_runtime_resume_and_get failed: -13 > > > [ 92.638262] rpi_firmware_set_power: Set HDMI to 1 > > > [ 95.678251] rpi_firmware_set_power: Set VEC to 1 > > > [ 95.678380] PM: noirq resume of devices complete after 9160.930 msecs > > > [ 95.679604] PM: early resume of devices complete after 1.069 msecs > > > [ 95.812230] brcmfmac: brcmf_fw_alloc_request: using > > > brcm/brcmfmac43455-sdio for chip BCM4345/6 > > > [ 95.812282] PM: resume of devices complete after 132.657 msecs > > > [ 95.813246] vc4_hdmi 3f902000.hdmi: Runtime PM usage count underflow! > > > [ 95.814456] OOM killer enabled. > > > [ 95.814466] Restarting tasks ... done. > > > [ 95.817193] random: crng reseeded on system resumption > > > [ 95.819813] rpi_firmware_set_power: Set HDMI to 0 > > > [ 95.827808] PM: suspend exit > > > > > > [1] - https://github.com/raspberrypi/firmware/issues/1894 > > The code itself looks fine to me still, but It's not clear to me why it > > getting called during suspend in the first place. > > This is a good question. I don't have idea why this should be necessary. > > But according to the kernel docs the worker output_poll_execute can be > disabled with drm_kms_helper_poll_disable(). Yeah, we might need to disable polling during suspend. I must admit I don't have much experience with suspend, so I won't be of much help. > Btw do we need to use SET_SYSTEM_SLEEP_PM_OPS here? > > IIRC, it's in the HPD > > interrupt handler path, could it be that the interrupt fires during > > suspend? > > Based on the trace and my further investigations the function is called > vc4_hdmi_connector_detect_ctx every 10s while no HDMI connector is > plugged by output_poll_execute. This has the advantage that also the > GPIOs from the expander could be used as HPD. Since the suspend test is > 5 sec long, there is a high chance for this warning. That's slightly unrelated, but we can actually use that GPIO for HPD since it supports interrupts. I did a patch quite some time ago that was never merged for a good reason, but I can't remember what it was exactly: https://github.com/raspberrypi/linux/pull/4327 > Here is my current workaround: > > drm/vc4: hdmi: Handle error case of pm_runtime_resume_and_get > > The commit 0f5251339eda ("drm/vc4: hdmi: Make sure the controller is > powered in detect") introduced the necessary power management handling > to avoid register access while controller is powered down. > Unfortunately it just print a warning if pm_runtime_resume_and_get() > fails and proceed anyway. > > This could happen during suspend to idle. So we must assume it is unsafe > to access the HDMI register. So bail out properly. > > Fixes: 0f5251339eda ("drm/vc4: hdmi: Make sure the controller is powered > in detect") > Signed-off-by: Stefan Wahren <wahrenst@xxxxxxx> > --- > drivers/gpu/drm/vc4/vc4_hdmi.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c > index d30f8e8e896717..2c1d59a181d833 100644 > --- a/drivers/gpu/drm/vc4/vc4_hdmi.c > +++ b/drivers/gpu/drm/vc4/vc4_hdmi.c > @@ -462,6 +462,7 @@ static int vc4_hdmi_connector_detect_ctx(struct > drm_connector *connector, > { > struct vc4_hdmi *vc4_hdmi = connector_to_vc4_hdmi(connector); > enum drm_connector_status status = connector_status_disconnected; > + int ret; > > /* > * NOTE: This function should really take vc4_hdmi->mutex, but > @@ -474,7 +475,11 @@ static int vc4_hdmi_connector_detect_ctx(struct > drm_connector *connector, > * the lock for now. > */ > > - WARN_ON(pm_runtime_resume_and_get(&vc4_hdmi->pdev->dev)); > + ret = pm_runtime_resume_and_get(&vc4_hdmi->pdev->dev); > + if (ret) { > + DRM_ERROR("Failed to retain HDMI power domain: %d\n", ret); > + return status; > + } > > if (vc4_hdmi->hpd_gpio) { > if (gpiod_get_value_cansleep(vc4_hdmi->hpd_gpio)) > Yeah... I think we should really just fix the "why is detect even called in the first place" issue rather than merge something like that. Maxime
Attachment:
signature.asc
Description: PGP signature