On Wed, May 24, 2017 at 09:14:52AM +0800, Shawn Lin wrote: > 在 2017/5/24 9:00, Brian Norris 写道: > >On Wed, May 24, 2017 at 08:54:14AM +0800, Shawn Lin wrote: > >>The reason for me to added this check is that I saw a external abort > >>down to rockchip_pcie_rd_own_conf, of which I highly suspected was that > >>the link was re-init or total broken at that time. > > > >I've seen plenty of aborts in this function as well, but I've verified > >that the link was still reported "up" in all the cases I could reproduce. > > > > I think it's reasonable as the link could be retrained automatically if > it's not totaly broken at all. Did you poweroff the endpoint and could > still pass this check? I don't think I powered it off entirely, but I did try asserting its PD# pin, which powers of most of the functionality -- enough that it apparently causes aborts, but doesn't bring the link down. > >So, do you "suspect" or did you "prove"? e.g., log cases where this > >check actually helps? > > I was powering off the devices and did a lspci, and saw the log cases > there. I will check this again. > > > > >And to Bjorn's point: do you know *why* such cases were hit? That would > >help to understand if the cases you're worrying about are hopelessly > >racy, or if there's some way to ensure synchronization. OK, so you've answered this question: losing power is hopelessly racy. I guess it's up to Bjorn as to whether this racy check is useful at all then. Brian