On Thu, Jul 11, 2024 at 06:59:22PM +0200, Johan Hovold wrote: > On Thu, Jul 11, 2024 at 10:11:53PM +0530, Manivannan Sadhasivam wrote: > > On Thu, Jul 11, 2024 at 09:49:52PM +0530, Manivannan Sadhasivam wrote: > > > My hunch is the PHY settings. But Abel cross checked the PHY settings with > > > internal documentation and they seem to match. Also, Qcom submitted a series > > > that is supposed to fix stability issues with Gen4 [1]. With this series, Gen 4 > > > x4 setup is working on SA8775P-RIDE board as reported by Qcom. But Abel > > > confirmed that it didn't help him with the link downgrade issue. > > > > > > Perhaps you can give it a try and see if it makes any difference for > > > this issue? > > If there are known issues with running at Gen4 speed without that > series, then it seems quite likely that doing so anyway could also cause > correctable errors. > > Unfortunately, I get a hypervisor reset when I tried booting with that > series so there appears to be some implicit dependency on something > else (e.g. the 4l stuff). The first patch in that series breaks icc handling, which crashes machines like the X13s and the x1e80100 CRD on boot. I've just reported this here: https://lore.kernel.org/lkml/ZpDlf5xD035x2DqL@xxxxxxxxxxxxxxxxxxxx/ With that fixed, and with the hacky dependency on having max-link-speed specified in the DT for the series to have any affect at all, the gen4 stability series indeed seems to make the AER error go away (Abel just confirmed using a branch I'd prepared). Let's try to get that series in shape and merged in some form as everyone will be hitting these Correctable Errors currently with the NVMe on x1e80100. Johan