Hi Shimoda-san, On Mon, Jul 6, 2020 at 1:14 PM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote: > On Fri, Jul 3, 2020 at 1:10 PM Yoshihiro Shimoda > <yoshihiro.shimoda.uh@xxxxxxxxxxx> wrote: > > > From: Geert Uytterhoeven, Sent: Tuesday, June 30, 2020 10:19 PM > > > On Mon, Jun 29, 2020 at 1:49 PM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote: > > > > On Mon, Jun 29, 2020 at 12:04 PM Yoshihiro Shimoda > > > > <yoshihiro.shimoda.uh@xxxxxxxxxxx> wrote: > > > > > > From: Geert Uytterhoeven, Sent: Friday, June 26, 2020 7:13 PM > > > > > > On Fri, Jun 26, 2020 at 11:32 AM Yoshihiro Shimoda > > > > > > <yoshihiro.shimoda.uh@xxxxxxxxxxx> wrote: > > > > > > > Note that v5.8-rc2 with r8a77951-salvator-xs seems to cause panic from > > > > > > > PCI driver when the system is suspended. So, I disabled the PCI > > > > > > > devices when I tested this patch series. > > > > > > > > > > > > Does this happen with current renesas-devel and renesas_defconfig? > > > > > > (it doesn't for me) > > > > > > > > > > Yes. I enabled PM_DEBUG and E1000E though. > > > > > > > > > > > Do you have any PCIe devices attached? (I haven't) > > > > > > > > > > Yes. (Intel Ethernet card is connected to the PCI slot.) > > > > > > > > > > < my environment > > > > > > - r8a77961-salvator-xs > > > > > - renesas-devel-2020-06-26-v5.8-rc2 > > > > > + renesas_defconfig + PM_DEBUG + E1000E > > > > > - initramfs > > > > > > > > Doesn't fail for me on R-Car H3 ES2.0, so it needs the presence of a > > > > PCIe card. Unfortunately I haven't any (added to shopping wishlist). "Intel Corporation 82574L Gigabit Network Connection" arrived and installed on local Salvator-X with M3-W. > > > > > > [...] > > > > > > > The failure mode looks like the PCI card is accessed while the PCI host > > > > bridge has been suspended. > > > > > > Does "[PATCH v1] driver core: Fix suspend/resume order issue with > > > deferred probe"[1] help? > > > > > > [1] https://lore.kernel.org/lkml/20200625032430.152447-1-saravanak@xxxxxxxxxx/ > > > > Even if I applied this patch, the issue still happened unfortunately. > > OK. > > I managed to reproduce it on the M3-W+ in Magnus' farm. And on my local M3-W. > > By the way, I'm guessing the issue is related to my environment which uses BSP's ATF. > > According to the commit log of upstream ATF [1], PCIe hardware is possible to causes SError. > > Unfortunately, I cannot try to update the firmware for some reasons now... I'll prepare > > updated firmware somehow... > > I don't think it's firmware-related. The issue happens in the PCI > suspend_noirq callback, which is called during late system suspend. You were right. It turns out the ATF on my M3-W board was two weeks too old to have commit 0969397f295621aa ("rcar_gen3: plat: Prevent PCIe hang during L1X config access"). Updating all firmware components to today's versions fixed that, and both s2idle and s2ram now work fine. I assume the same would be true for M3-W+, so case closed (for R-Car Gen3)? > Anyone who can reproduce this on a different board, also on R-Car Gen2 > or even R-Car H1? > > Intel E1000E card with CONFIG_E1000E=y > > echo 0 > /sys/module/printk/parameters/console_suspend > echo mem > /sys/power/state I moved the E1000E card to an R-Car Gen2 board (r8a7791/koelsch), and s2idle crashes in a similar way: Unhandled fault: asynchronous external abort (0x1211) at 0x00000000 pgd = ceadf1f8 [00000000] *pgd=80000040004003, *pmd=00000000 Internal error: : 1211 [#1] SMP ARM Modules linked in: CPU: 0 PID: 124 Comm: kworker/u4:6 Not tainted 5.8.0-koelsch-00539-gce07c9ba6e9f601c #867 Hardware name: Generic R-Car Gen2 (Flattened Device Tree) Workqueue: events_unbound async_run_entry_fn PC is at rcar_pcie_config_access+0x10c/0x13c LR is at rcar_pcie_config_access+0x10c/0x13c pc : [<c04a4ab4>] lr : [<c04a4ab4>] psr: 60000093 sp : e67b3e00 ip : 00000000 fp : 00000000 r10: 00000000 r9 : 00000000 r8 : e7369800 r7 : 00000000 r6 : e67b3e40 r5 : e7369640 r4 : 000000cc r3 : f0900000 r2 : f0900018 r1 : f0900020 r0 : 00000000 Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Control: 30c5387d Table: 648fe480 DAC: fffffffd Process kworker/u4:6 (pid: 124, stack limit = 0x0dcce627) Stack: (0xe67b3e00 to 0xe67b4000) ... [<c04a4ab4>] (rcar_pcie_config_access) from [<c04a4be0>] (rcar_pcie_read_conf+0x28/0x80) [<c04a4be0>] (rcar_pcie_read_conf) from [<c048a4e0>] (pci_bus_read_config_word+0x68/0xa8) [<c048a4e0>] (pci_bus_read_config_word) from [<c0490030>] (pci_raw_set_power_state+0x18c/0x270) [<c0490030>] (pci_raw_set_power_state) from [<c0492e20>] (pci_set_power_state+0x98/0xcc) [<c0492e20>] (pci_set_power_state) from [<c0492ea0>] (pci_prepare_to_sleep+0x4c/0x6c) [<c0492ea0>] (pci_prepare_to_sleep) from [<c0496c84>] (pci_pm_suspend_noirq+0x228/0x244) [<c0496c84>] (pci_pm_suspend_noirq) from [<c0509d88>] (dpm_run_callback.part.5+0x44/0xac) [<c0509d88>] (dpm_run_callback.part.5) from [<c050b38c>] (__device_suspend_noirq+0x74/0x1a4) > Why haven't we seen this before? > I can reproduce the issue on v5.5 (first version that supported M3-W+, > but needs backported DTS for PCIe support) and later. On Koelsch, I could easily reproduce this on v4.10 and later[1]. As this time no firmware is involved, I guess Linux itself needs to become aware of this issue, and handle it in a similar way like ATF on arm64[2]? [1] Using shmobile_defconfig + CONFIG_NET_VENDOR_INTEL=y + CONFIG_E1000E=y. v4.10 needs CONFIG_PCI_MSI=y + CONFIG_PCIE_RCAR=y, too. Older kernels are not compatible with my Debian (systemd!) nfsroot userland. [2] https://github.com/ARM-software/arm-trusted-firmware/commit/0969397f295621aa26b3d14b76dd397d22be58bf Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds