Re: vc4: WARNING during suspend to idle of Raspberry Pi 3 Plus

Stefan Wahren <wahrenst@xxxxxxx> · Thu, 27 Jun 2024 12:53:52 +0200

Am 27.06.24 um 11:58 schrieb Maxime Ripard:
On Tue, Jun 25, 2024 at 04:21:11PM GMT, Stefan Wahren wrote:
Am 24.06.24 um 14:57 schrieb Maxime Ripard:
On Sun, Jun 16, 2024 at 11:27:08AM GMT, Stefan Wahren wrote:
Hi,
i'm currently experiment with suspend to idle on the Raspberry Pi [1].

During my tests, i noticed a WARNING of the vc4 during suspend incl.
Runtime PM usage count underflow. It would be nice if someone can look
at it. In case you want to reproduce it, i can prepare a branch with
some improvements/hacks. For example i disabled dwc2/USB because it
cause a lot of trouble during resume.

Here are the steps to trigger this issue:
- make sure necessary kernel options are enabled ( CONFIG_SUSPEND,
CONFIG_PM_DEBUG, CONFIG_PM_ADVANCED_DEBUG )
- make sure no HDMI cable is connected to Raspberry Pi 3 Plus
sorry, i forgot the specific type of the Raspberry Pi 3 B Plus, but the
issue on Pi 3 A Plus is the same.
- Add "no_console_suspend" to cmdline.txt and reboot
- Connect via Debug UART:

echo 1 > /sys/power/pm_debug_messages
echo platform > /sys/power/pm_test
echo freeze > /sys/power/state

Here is the output:

[   75.538497] PM: suspend entry (s2idle)
[   76.915786] Filesystems sync: 1.377 seconds
[   79.678262] rpi_firmware_set_power: Set HDMI to 1
[   79.678634] rpi_firmware_set_power: Set HDMI to 0
[   79.850949] Freezing user space processes
[   79.852460] Freezing user space processes completed (elapsed 0.001
seconds)
[   79.852484] OOM killer disabled.
[   79.852493] Freezing remaining freezable tasks
[   79.853684] Freezing remaining freezable tasks completed (elapsed
0.001 seconds)
[   80.892819] ieee80211 phy0: brcmf_fil_cmd_data: bus is down. we have
nothing to do.
[   80.892843] ieee80211 phy0: brcmf_cfg80211_get_tx_power: error (-5)
[   81.514053] PM: suspend of devices complete after 1659.336 msecs
[   81.514089] PM: start suspend of devices complete after 1660.386 msecs
[   81.515616] PM: late suspend of devices complete after 1.509 msecs
[   81.515938] rpi_firmware_set_power: Set VEC to 0
[   81.516010] rpi_firmware_set_power: Set V3D to 0
[   81.516998] PM: noirq suspend of devices complete after 1.239 msecs
[   81.517016] PM: suspend debug: Waiting for 5 second(s).
[   89.598310] rpi_firmware_set_power: Set V3D to 1
[   90.078228] ------------[ cut here ]------------
[   90.078240] WARNING: CPU: 1 PID: 216 at
drivers/gpu/drm/vc4/vc4_hdmi.c:477
vc4_hdmi_connector_detect_ctx+0x2e4/0x34c [vc4]
[   90.078344] Modules linked in: aes_arm aes_generic cbc crypto_simd
cryptd algif_skcipher af_alg brcmfmac_wcc brcmfmac vc4 brcmutil
snd_soc_hdmi_codec snd_soc_core ac97_bus sha256_generic
snd_pcm_dmaengine libsha256 snd_pcm sha256_arm snd_timer hci_uart
cfg80211 btbcm snd bluetooth soundcore drm_dma_helper crc32_arm_ce
ecdh_generic ecc raspberrypi_hwmon libaes bcm2835_thermal
[   90.078551] CPU: 1 PID: 216 Comm: kworker/1:2 Not tainted 6.9.3-dirty #30
[   90.078568] Hardware name: BCM2835
[   90.078580] Workqueue: events output_poll_execute
[   90.078610] Call trace:
[   90.078624]  unwind_backtrace from show_stack+0x10/0x14
[   90.078660]  show_stack from dump_stack_lvl+0x50/0x64
[   90.078688]  dump_stack_lvl from __warn+0x7c/0x124
[   90.078723]  __warn from warn_slowpath_fmt+0x170/0x178
[   90.078760]  warn_slowpath_fmt from
vc4_hdmi_connector_detect_ctx+0x2e4/0x34c [vc4]
[   90.078862]  vc4_hdmi_connector_detect_ctx [vc4] from
drm_helper_probe_detect_ctx+0x40/0x120
[   90.078951]  drm_helper_probe_detect_ctx from
output_poll_execute+0x160/0x24c
[   90.078982]  output_poll_execute from process_one_work+0x16c/0x3b4
[   90.079012]  process_one_work from worker_thread+0x270/0x488
[   90.079036]  worker_thread from kthread+0xe0/0xfc
[   90.079060]  kthread from ret_from_fork+0x14/0x28
[   90.079080] Exception stack(0xf0af9fb0 to 0xf0af9ff8)
[   90.079096] 9fa0:                                     00000000
00000000 00000000 00000000
[   90.079113] 9fc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[   90.079129] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   90.079141] ---[ end trace 0000000000000000 ]---
[   90.079155] vc4_hdmi 3f902000.hdmi: vc4_hdmi_connector_detect_ctx:
pm_runtime_resume_and_get failed: -13
[   92.638262] rpi_firmware_set_power: Set HDMI to 1
[   95.678251] rpi_firmware_set_power: Set VEC to 1
[   95.678380] PM: noirq resume of devices complete after 9160.930 msecs
[   95.679604] PM: early resume of devices complete after 1.069 msecs
[   95.812230] brcmfmac: brcmf_fw_alloc_request: using
brcm/brcmfmac43455-sdio for chip BCM4345/6
[   95.812282] PM: resume of devices complete after 132.657 msecs
[   95.813246] vc4_hdmi 3f902000.hdmi: Runtime PM usage count underflow!
[   95.814456] OOM killer enabled.
[   95.814466] Restarting tasks ... done.
[   95.817193] random: crng reseeded on system resumption
[   95.819813] rpi_firmware_set_power: Set HDMI to 0
[   95.827808] PM: suspend exit

[1] - https://github.com/raspberrypi/firmware/issues/1894
The code itself looks fine to me still, but It's not clear to me why it
getting called during suspend in the first place.
This is a good question. I don't have idea why this should be necessary.

But according to the kernel docs the worker output_poll_execute can be
disabled with drm_kms_helper_poll_disable().
Yeah, we might need to disable polling during suspend.

Agree. Yesterday i managed to get s2idle working on a old Raspberry Pi 1
B, so i will prepare a patch for this. It's not the only driver for
BCM2835, which need suspend patches.

I must admit I
don't have much experience with suspend, so I won't be of much help.

Btw do we need to use SET_SYSTEM_SLEEP_PM_OPS here?
   IIRC, it's in the HPD
interrupt handler path, could it be that the interrupt fires during
suspend?
Based on the trace and my further investigations the function is called
vc4_hdmi_connector_detect_ctx every 10s while no HDMI connector is
plugged by output_poll_execute. This has the advantage that also the
GPIOs from the expander could be used as HPD. Since the suspend test is
5 sec long, there is a high chance for this warning.
That's slightly unrelated, but we can actually use that GPIO for HPD
since it supports interrupts. I did a patch quite some time ago that was
never merged for a good reason, but I can't remember what it was
exactly:

https://github.com/raspberrypi/linux/pull/4327
Yes, the commit "pinctrl: bcm2835: Disable interrupts for the banks > 0"
is not really acceptable. From my suspend investigations so far, i can
tell there some issues with the irqchip driver(s). So this commit looks
like a workaround.

Here is my current workaround:

drm/vc4: hdmi: Handle error case of pm_runtime_resume_and_get

The commit 0f5251339eda ("drm/vc4: hdmi: Make sure the controller is
powered in detect") introduced the necessary power management handling
to avoid register access while controller is powered down.
Unfortunately it just print a warning if pm_runtime_resume_and_get()
fails and proceed anyway.

This could happen during suspend to idle. So we must assume it is unsafe
to access the HDMI register. So bail out properly.

Fixes: 0f5251339eda ("drm/vc4: hdmi: Make sure the controller is powered
in detect")
Signed-off-by: Stefan Wahren <wahrenst@xxxxxxx>
---
  drivers/gpu/drm/vc4/vc4_hdmi.c | 7 ++++++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c
index d30f8e8e896717..2c1d59a181d833 100644
--- a/drivers/gpu/drm/vc4/vc4_hdmi.c
+++ b/drivers/gpu/drm/vc4/vc4_hdmi.c
@@ -462,6 +462,7 @@ static int vc4_hdmi_connector_detect_ctx(struct
drm_connector *connector,
  {
      struct vc4_hdmi *vc4_hdmi = connector_to_vc4_hdmi(connector);
      enum drm_connector_status status = connector_status_disconnected;
+    int ret;

      /*
       * NOTE: This function should really take vc4_hdmi->mutex, but
@@ -474,7 +475,11 @@ static int vc4_hdmi_connector_detect_ctx(struct
drm_connector *connector,
       * the lock for now.
       */

- WARN_ON(pm_runtime_resume_and_get(&vc4_hdmi->pdev->dev));
+    ret = pm_runtime_resume_and_get(&vc4_hdmi->pdev->dev);
+    if (ret) {
+        DRM_ERROR("Failed to retain HDMI power domain: %d\n", ret);
+        return status;
+    }

      if (vc4_hdmi->hpd_gpio) {
          if (gpiod_get_value_cansleep(vc4_hdmi->hpd_gpio))

Yeah... I think we should really just fix the "why is detect even called
in the first place" issue rather than merge something like that.
Yes, but i think the WARN_ON [1] and the missing error handling of
pm_runtime_resume_and_get is still an issue.

Best regards

[1] - https://lwn.net/Articles/969923/

Maxime