E1000 s2idle crash (was: Re: [PATCH/RFC v4 0/4] treewide: add regulator condition on _mmc_suspend())

Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> · Tue, 11 Aug 2020 15:50:13 +0200

Hi Shimoda-san,

On Mon, Jul 6, 2020 at 1:14 PM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
> On Fri, Jul 3, 2020 at 1:10 PM Yoshihiro Shimoda
> <yoshihiro.shimoda.uh@xxxxxxxxxxx> wrote:
> > > From: Geert Uytterhoeven, Sent: Tuesday, June 30, 2020 10:19 PM
> > > On Mon, Jun 29, 2020 at 1:49 PM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
> > > > On Mon, Jun 29, 2020 at 12:04 PM Yoshihiro Shimoda
> > > > <yoshihiro.shimoda.uh@xxxxxxxxxxx> wrote:
> > > > > > From: Geert Uytterhoeven, Sent: Friday, June 26, 2020 7:13 PM
> > > > > > On Fri, Jun 26, 2020 at 11:32 AM Yoshihiro Shimoda
> > > > > > <yoshihiro.shimoda.uh@xxxxxxxxxxx> wrote:
> > > > > > > Note that v5.8-rc2 with r8a77951-salvator-xs seems to cause panic from
> > > > > > > PCI driver when the system is suspended. So, I disabled the PCI
> > > > > > > devices when I tested this patch series.
> > > > > >
> > > > > > Does this happen with current renesas-devel and renesas_defconfig?
> > > > > > (it doesn't for me)
> > > > >
> > > > > Yes. I enabled PM_DEBUG and E1000E though.
> > > > >
> > > > > > Do you have any PCIe devices attached? (I haven't)
> > > > >
> > > > > Yes. (Intel Ethernet card is connected to the PCI slot.)
> > > > >
> > > > > < my environment >
> > > > > - r8a77961-salvator-xs
> > > > > - renesas-devel-2020-06-26-v5.8-rc2
> > > > >  + renesas_defconfig + PM_DEBUG + E1000E
> > > > > - initramfs
> > > >
> > > > Doesn't fail for me on R-Car H3 ES2.0, so it needs the presence of a
> > > > PCIe card.  Unfortunately I haven't any (added to shopping wishlist).

"Intel Corporation 82574L Gigabit Network Connection" arrived and installed
on local Salvator-X with M3-W.

> > >
> > > [...]
> > >
> > > > The failure mode looks like the PCI card is accessed while the PCI host
> > > > bridge has been suspended.
> > >
> > > Does "[PATCH v1] driver core: Fix suspend/resume order issue with
> > > deferred probe"[1] help?
> > >
> > > [1] https://lore.kernel.org/lkml/20200625032430.152447-1-saravanak@xxxxxxxxxx/
> >
> > Even if I applied this patch, the issue still happened unfortunately.
>
> OK.
>
> I managed to reproduce it on the M3-W+ in Magnus' farm.

And on my local M3-W.

> > By the way, I'm guessing the issue is related to my environment which uses BSP's ATF.
> > According to the commit log of upstream ATF [1], PCIe hardware is possible to causes SError.
> > Unfortunately, I cannot try to update the firmware for some reasons now... I'll prepare
> > updated firmware somehow...
>
> I don't think it's firmware-related.  The issue happens in the PCI
> suspend_noirq callback, which is called during late system suspend.

You were right. It turns out the ATF on my M3-W board was two weeks too
old to have commit 0969397f295621aa ("rcar_gen3: plat: Prevent PCIe hang
during L1X config access"). Updating all firmware components to today's
versions fixed that, and both s2idle and s2ram now work fine.

I assume the same would be true for M3-W+, so case closed (for R-Car Gen3)?

> Anyone who can reproduce this on a different board, also on R-Car Gen2
> or even R-Car H1?
>
>     Intel E1000E card with CONFIG_E1000E=y
>
>     echo 0 > /sys/module/printk/parameters/console_suspend
>     echo mem > /sys/power/state

I moved the E1000E card to an R-Car Gen2 board (r8a7791/koelsch), and
s2idle crashes in a similar way:

    Unhandled fault: asynchronous external abort (0x1211) at 0x00000000
    pgd = ceadf1f8
    [00000000] *pgd=80000040004003, *pmd=00000000
    Internal error: : 1211 [#1] SMP ARM
    Modules linked in:
    CPU: 0 PID: 124 Comm: kworker/u4:6 Not tainted
5.8.0-koelsch-00539-gce07c9ba6e9f601c #867
    Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
    Workqueue: events_unbound async_run_entry_fn
    PC is at rcar_pcie_config_access+0x10c/0x13c
    LR is at rcar_pcie_config_access+0x10c/0x13c
    pc : [<c04a4ab4>]    lr : [<c04a4ab4>]    psr: 60000093
    sp : e67b3e00  ip : 00000000  fp : 00000000
    r10: 00000000  r9 : 00000000  r8 : e7369800
    r7 : 00000000  r6 : e67b3e40  r5 : e7369640  r4 : 000000cc
    r3 : f0900000  r2 : f0900018  r1 : f0900020  r0 : 00000000
    Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
    Control: 30c5387d  Table: 648fe480  DAC: fffffffd
    Process kworker/u4:6 (pid: 124, stack limit = 0x0dcce627)
    Stack: (0xe67b3e00 to 0xe67b4000)
    ...
    [<c04a4ab4>] (rcar_pcie_config_access) from [<c04a4be0>]
(rcar_pcie_read_conf+0x28/0x80)
    [<c04a4be0>] (rcar_pcie_read_conf) from [<c048a4e0>]
(pci_bus_read_config_word+0x68/0xa8)
    [<c048a4e0>] (pci_bus_read_config_word) from [<c0490030>]
(pci_raw_set_power_state+0x18c/0x270)
    [<c0490030>] (pci_raw_set_power_state) from [<c0492e20>]
(pci_set_power_state+0x98/0xcc)
    [<c0492e20>] (pci_set_power_state) from [<c0492ea0>]
(pci_prepare_to_sleep+0x4c/0x6c)
    [<c0492ea0>] (pci_prepare_to_sleep) from [<c0496c84>]
(pci_pm_suspend_noirq+0x228/0x244)
    [<c0496c84>] (pci_pm_suspend_noirq) from [<c0509d88>]
(dpm_run_callback.part.5+0x44/0xac)
    [<c0509d88>] (dpm_run_callback.part.5) from [<c050b38c>]
(__device_suspend_noirq+0x74/0x1a4)

> Why haven't we seen this before?
> I can reproduce the issue on v5.5 (first version that supported M3-W+,
> but needs backported DTS for PCIe support) and later.

On Koelsch, I could easily reproduce this on v4.10 and later[1].

As this time no firmware is involved, I guess Linux itself needs to
become aware of this issue, and handle it in a similar way like ATF
on arm64[2]?

[1] Using shmobile_defconfig + CONFIG_NET_VENDOR_INTEL=y + CONFIG_E1000E=y.
    v4.10 needs CONFIG_PCI_MSI=y + CONFIG_PCIE_RCAR=y, too.
    Older kernels are not compatible with my Debian (systemd!) nfsroot
    userland.

[2] https://github.com/ARM-software/arm-trusted-firmware/commit/0969397f295621aa26b3d14b76dd397d22be58bf

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds