Re: [BUG] rk3399 fails to reboot correctly with PCIE device inserted

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 4, 2019 at 1:28 PM Peter Geis <pgwipeout@xxxxxxxxx> wrote:
>
> On Wed, Dec 4, 2019 at 12:42 PM Robin Murphy <robin.murphy@xxxxxxx> wrote:
> >
> > On 04/12/2019 5:28 pm, Peter Geis wrote:
> > > On Mon, Nov 25, 2019 at 7:05 PM Peter Geis <pgwipeout@xxxxxxxxx> wrote:
> > >>
> > >> On Mon, Nov 25, 2019 at 12:10 PM Peter Geis <pgwipeout@xxxxxxxxx> wrote:
> > >>>
> > >>> On Mon, Nov 25, 2019 at 11:52 AM Robin Murphy <robin.murphy@xxxxxxx> wrote:
> > >>>>
> > >>>> Hi Peter,
> > >>>>
> > >>>> On 25/11/2019 4:28 pm, Peter Geis wrote:
> > >>>>> Good Morning,
> > >>>>>
> > >>>>> Another issue I've come across while testing PCIE on the rockpro64.
> > >>>>> When a PCIE device is inserted into the board, and it enumerates
> > >>>>> successfully, the board will not reset.
> > >>>>> I've tried various states of u-boot-rockchip, u-boot-mainline, with
> > >>>>> both miniloader and TPL/SPL.
> > >>>>
> > >>>> In case it's relevant, what particular PCIe device(s) have you seen the
> > >>>> issue with? FWIW I've been running a Samsung 960 Evo NVMe in my
> > >>>> NanoPC-T4 with mainline kernels for months now and it's always rebooted
> > >>>> flawlessly.
> > >>>>
> > >>>> Robin.
> > >>>
> > >>> Currently with a I350 NIC, but also observed with a pcie switch, and the GTX645.
> > >>> The NIC works, while the other two didn't without the patch to hijack
> > >>> the error handler.
> > >>>
> > >>> I am running the latest atf built from their github.
> > >>
> > >> On closer examination, it isn't the pcie devices causing the reboot
> > >> issues, the rk3399 just doesn't reboot.
> > >> It would seem the trigger with miniloader was random enough that it
> > >> appeared to be tied to my pcie testing.
> > >> It happens 100% of the time with tpl/spl.
> > >
> > > With further testing, I think I've found the trigger of the reboot failure.
> > > It would seem with ATF compiled from source, psci-reboot is not
> > > actually triggering the reboot.
> > > The reason my board stopped rebooting entirely is because I had
> > > somehow broken the psci-watchdog.
> > >
> > > I rebuilt all from source, stripping all modifications I had done and
> > > using the defconfigs.
> > > I get the following message at reboot time:
> > > [ 2839.724508] watchdog: watchdog0: watchdog did not stop!
> > > [ 2841.162516] reboot: Restarting system
> > > U-Boot TPL 2020.01-rc3-00070-g9a0cbae22a-dirty (Dec 03 2019 - 14:07:57)
> > > Whereas before the watchdog alert was not triggering and reboot never occurred.
> > >
> > > It would seem that the pcsi-reboot function is dead, and the only
> > > reason the boards are actually rebooting is because the psci-watchdog
> > > is triggering the reboot after the kernel fails to check in.
> > >
> > > Now I am still having the issue with boot hanging after a warm reboot
> > > when certain pci-e devices are installed (particularly, the i350
> > > network controller).
> > > I think this may be due to the pci-e controller driver lacking proper
> > > shutdown cleanup code, which is allowing the i350 to continue to
> > > trigger either interrupts or dma transfers following the soft-reboot.
> > >
> > > The hang occurs roughly the same point, when either the iommu or the
> > > first dma device is initialized.
> > > Occasionally the A72 cluster fails to initialize as well.
> >
> > It turns out there's been a general issue with upstream ATF failing to
> > reboot RK3399 correctly, which has just been tracked down to power
> > domain states getting out of sync - there's more info on the U-Boot list
> > here: https://lists.denx.de/pipermail/u-boot/2019-December/392348.html
> >
> > Robin.
>
> Thanks!
> Seems there were two issues here, both involving the power bugs I've
> been tracking.
>
> First, there was no sanity check if there was a power-off or reset
> gpio, before trying to get the gpio.
> This broke reset and poweroff functions on board without reset or
> power-off gpios.
> The fix they implemented is to try to set the gpio value before
> getting the gpio, which fails if the gpio doesn't exist and it returns
> null in that case.
> This fix has been merged as of last night.
>
> The power domain issue hasn't been merged yet, but I've grabbed that
> patch and will test it as well.

Confirmed the reset function is working after the gpio patch that was
just merged. [0].
Confirmed the lockup issue after a soft reset is resolved by this patch [1].

The power off issue still exists, but I dug into the psci pm code for
the poweroff function and unless there is a gpio this function is a
no-op.
For this reason I think the rk808 driver should be modified to set
itself as the primary poweroff provider if the
rockchip,system-power-controller flag is set.
The other option is to somehow make ATF aware of the rk808 and have it
trigger the poweroff.
Thoughts on this?

[0] https://github.com/ARM-software/arm-trusted-firmware/commit/45d4611563038486890b40d61e41b68213326afc
[1] https://github.com/armbian/build/blob/master/patch/atf/atf-rk3399/switch-power-domains-on-before-reset.patch
>
> >
> > >
> > > Log is below:
> > > [    0.261198] Detected PIPT I-cache on CPU5
> > > [    0.261223] GICv3: CPU5: found redistributor 101 region 0:0x00000000fefa0000
> > > [    0.261235] GICv3: CPU5: using allocated LPI pending table
> > > @0x00000000f0120000
> > > [    0.261263] CPU5: Booted secondary processor 0x0000000101 [0x410fd082]
> > > [    0.261377] smp: Brought up 1 node, 6 CPUs
> > > [    0.274833] SMP: Total of 6 processors activated.
> > > [    0.275297] CPU features: detected: 32-bit EL0 Support
> > > [    0.275801] CPU features: detected: CRC32 instructions
> > > [    0.290797] CPU: All CPU(s) started at EL2
> > > [    0.291242] alternatives: patching kernel code
> > > [    0.294848] devtmpfs: initialized
> > > [    0.311658] clocksource: jiffies: mask: 0xffffffff max_cycles:
> > > 0xffffffff, max_idle_ns: 7645041785100000 ns
> > > [    0.312629] futex hash table entries: 2048 (order: 5, 131072 bytes, linear)
> > > [    0.315223] pinctrl core: initialized pinctrl subsystem
> > > [    0.318097] DMI not present or invalid.
> > > [    0.318989] NET: Registered protocol family 16
> > > [    0.326798] DMA: preallocated 256 KiB pool for atomic allocations
> > > [    0.327415] audit: initializing netlink subsys (disabled)
> > > [    0.328106] audit: type=2000 audit(0.320:1): state=initialized
> > > audit_enabled=0 res=1
> > > [    0.330213] cpuidle: using governor menu
> > > [    0.331160] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers.
> > > [    0.334653] Serial: AMBA PL011 UART driver
> > > [    0.384125] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
> > > [    0.384800] HugeTLB registered 32.0 MiB page size, pre-allocated 0 pages
> > > [    0.385483] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
> > > [    0.386146] HugeTLB registered 64.0 KiB page size, pre-allocated 0 pages
> > > [    0.390063] cryptd: max_cpu_qlen set to 1000
> > > [    0.396205] ACPI: Interpreter disabled.
> > > [    0.399113] vcc3v3_pcie: supplied by vcc12v_dcin
> > > [    0.400706] vcc5v0_sys: supplied by vcc12v_dcin
> > > [    0.401426] vcc5v0_usb: supplied by vcc12v_dcin
> > > [    0.402060] vcc3v3_sys: supplied by vcc5v0_sys
> > > [    0.403275] iommu: Default domain type: Translated
> > > [
> > >
> > >>
> > >>>
> > >>>>
> > >>>>> With miniloader and both variants of u-boot, if you attempt a reboot
> > >>>>> it never fires the "reboot: Restarting system" message.
> > >>>>> If you trigger a sysrq reboot at this stage, it will reboot, but fails
> > >>>>> to start up the two a72 cores and subsequently hangs a second later
> > >>>>> when it loads the first dma driver.
> > >>>>>
> > >>>>> With TPL/SPL on mainline-u-boot (I can't get rockchip-u-boot to work
> > >>>>> with TPL/SPL), it fires the "reboot: Restarting system" message, but
> > >>>>> never reboots.
> > >>>>> sysrq does not function at this point.
> > >>>>>
> > >>>>> I believe the pcie controller is not being halted, and gets stuck in a
> > >>>>> loop with the two a72 cores.
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> Linux-rockchip mailing list
> > >>>>> Linux-rockchip@xxxxxxxxxxxxxxxxxxx
> > >>>>> http://lists.infradead.org/mailman/listinfo/linux-rockchip
> > >>>>>

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-rockchip



[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux