From: Sean Wang <sean.wang@xxxxxxxxxxxx> >On Thu, 2022-03-24 at 10:13 +0100, Íñigo Huguet wrote: >> On Wed, Dec 22, 2021 at 12:52 PM Philippe Schenker <dev@xxxxxxxxxxxx> >> wrote: >> > >> > Hello >> > >> > So I received a new notebook recently, this is a Lenovo P14s that >> > has a Mediatek 7961 network controller inside. >> > >> > ----- >> > >> > 03:00.0 Network controller: MEDIATEK Corp. Device 7961 >> > Subsystem: Lenovo Device e0bc >> > Physical Slot: 0 >> > Flags: bus master, fast devsel, latency 0, IRQ 91, IOMMU >> > group >> > 13 >> > Memory at 870200000 (64-bit, prefetchable) [size=1M] >> > Memory at 870300000 (64-bit, prefetchable) [size=16K] >> > Memory at 870304000 (64-bit, prefetchable) [size=4K] >> > Capabilities: <access denied> >> > Kernel driver in use: mt7921e >> > Kernel modules: mt7921e >> > ------ >> > >> > I have the issue that on 5.16-rc6 kernel (also on other rcs) it is >> > always freezing after I issue a "reboot" command. "poweroff" >> > followed by >> > a normal power-on works always. >> >> I have a bug report with this same behaviour and almost identical >> kernel logs.: message "Timeout for driver own" followed by traces >> related to mt7921 dma stuff, indicating bad page state with refcount >> -1 and "page dumped because: nonzero _refcount", finally causing a >> crash during boot up, but only after reboot, not after normal power >> on. >> >> It happens always, even with v5.17. Commit 602cc0c9618a (mt76: >> mt7921e: fix possible probe failure after reboot) doesn't fix the >> issue. >> >> I hadn't been able to verify where the problem exactly is, but my >> guess is this: >> - In function mt7921_init_hardware, initialization fails because >> mt7921e_driver_own doesn't finish before the timeout (thus we see the >> "Timeout for driver own") >> - Then, before retrying to init, mt7921_init_hardware calls >> mt7921e_init_reset, and the latter calls to mt7921_wpdma_reset >> - That makes a cleanup of the DMA queues before stopping the DMA, >> which had been enabled short before during probe >> - Then, my guess is that in the meanwhile, a DMA event arrives with >> the queues stillI being cleaned up >> >> Does it make sense? > >After your suggestion I went down the rabbit-hole and bisected this issue. Fortunately I found the commit introducing the issue. Reverting this commit solves the problem for me on v5.17. It is caused around the PCIe ASPM feature. > ># first bad commit: [bf3747ae2e25dda6a9e6c464a717c66118c588c8] mt76: >mt7921: enable aspm by default have you tried the latest firmware to see if it can help with the issue ? such as https://patchwork.kernel.org/project/linux-mediatek/patch/8e8a3e94ffe7586cec5abe56ba507e1e3ed8b823.1648171096.git.objelf@xxxxxxxxx/ > >@Felix do I have to report this anywhere else than on here? > >Thanks, >Philippe > >> >> > >> > Since it freezes and showing multiple Call Traces I included 4 logs >> > in the attachment, it certainly points always to mt76_dma functions. <snip>