> -----Original Message----- > From: Jon Hunter <jonathanh@xxxxxxxxxx> > Sent: 2021年3月25日 16:01 > To: Joakim Zhang <qiangqing.zhang@xxxxxxx> > Cc: netdev@xxxxxxxxxxxxxxx; Linux Kernel Mailing List > <linux-kernel@xxxxxxxxxxxxxxx>; linux-tegra <linux-tegra@xxxxxxxxxxxxxxx>; > Jakub Kicinski <kuba@xxxxxxxxxx> > Subject: Re: Regression v5.12-rc3: net: stmmac: re-init rx buffers when mac > resume back > > > On 25/03/2021 07:53, Joakim Zhang wrote: > > > >> -----Original Message----- > >> From: Jon Hunter <jonathanh@xxxxxxxxxx> > >> Sent: 2021年3月24日 20:39 > >> To: Joakim Zhang <qiangqing.zhang@xxxxxxx> > >> Cc: netdev@xxxxxxxxxxxxxxx; Linux Kernel Mailing List > >> <linux-kernel@xxxxxxxxxxxxxxx>; linux-tegra > >> <linux-tegra@xxxxxxxxxxxxxxx>; Jakub Kicinski <kuba@xxxxxxxxxx> > >> Subject: Re: Regression v5.12-rc3: net: stmmac: re-init rx buffers > >> when mac resume back > >> > >> > >> > >> On 24/03/2021 12:20, Joakim Zhang wrote: > >> > >> ... > >> > >>> Sorry for this breakage at your side. > >>> > >>> You mean one of your boards? Does other boards with STMMAC can work > >> fine? > >> > >> We have two devices with the STMMAC and one works OK and the other > fails. > >> They are different generation of device and so there could be some > >> architectural differences which is causing this to only be seen on one device. > > It's really strange, but I also don't know what architectural differences could > affect this. Sorry. > > > Maybe caching somewhere? In other words, could there be any cache flushing > that we are missing here? Have no idea, have not account into such case. > >>> We do daily test with NFS to mount rootfs, on issue found. And I add > >>> this > >> patch at the resume patch, and on error check, this should not break > suspend. > >>> I even did the overnight stress test, there is no issue found. > >>> > >>> Could you please do more test to see where the issue happen? > >> > >> The issue occurs 100% of the time on the failing board and always on > >> the first resume from suspend. Is there any more debug I can enable > >> to track down what the problem is? > >> > > > > As commit messages described, the patch aims to re-init rx buffers > > address, since the address is not fixed, so I only can recycle and then > re-allocate all of them. The page pool is allocated once when open the net > device. > > > > Could you please debug if it fails at some functions, such as > page_pool_dev_alloc_pages() ? > > > Yes that was the first thing I tried, but no obvious failures from allocating the > pools. > > Are you certain that the problem you are seeing, that is being fixed by this > change, is generic to all devices? The commit message states that 'descriptor > write back by DMA could exhibit unusual behavior', is this a known issue in the > STMMAC controller? If so does this impact all versions and what is the actual > problem? Yes, I confirm this patch fix issue at my side. It should not be a generic, it can reproduce at one of our boards. To be honest, I have not found the root cause, this should be a workaround, I upstream it since I think it will not affect others which don't suffer from this. Best Regards, Joakim Zhang > Jon > > -- > nvpublic