On Thu, 2018-03-22 at 00:36 -0400, okaya@xxxxxxxxxxxxxx wrote: > On 2018-03-22 00:03, Benjamin Herrenschmidt wrote: > > Hi Folks ! > > > > So while chasing some issues in our EEH error handling, we noticed that > > the generic code has about a bazillion of "reset" path for devices, > > most of them seemingly missing a wait for CRS after the reset. > > > > That includes PM based resets or wakeups (can a D3->D0 transition cause > > CRS to be returned ? Unclear but we should try to be safe), but mostly > > it includes anything that resets the pcie port (PERST) or the secondary > > bridge reset (hot resets). > > > > For example take __pci_reset_function_locked(...), it can call > > pci_parent_bus_reset() which will perform a hot reset but will *not* > > wait for CRS. > > > > There are a plethora of reset path in there that are similar, it's > > actually hard to figure out which is what, but they all have in common > > that they don't wait for CRS with the notable exception of the FLR > > case. > > Bjorn merged my change for 4.17. Kernel should handle these now. Ah nice ! I'll check that out, I was checking 4.16-rc6 ! > > I'm keen on doing a rather "blanket" fix by adding a CRS wait inside > > pci_dev_restore(). Would you guys agree ? > > > > Also why does pci_flr_wait() not use vendor/device ID but instead waits > > on the COMMAND register being all 1's ? It's not clear to me ... > > VID/DID will give a very specific signature for CRS which is ffff0001 > > while COMMAND could return all 1's for other reasons (device unplugged > > for example). > > > > Because if you read vendor id of a virtual function, you get 0xffffffff Ah indeed, I forgot about that... Thanks ! Cheers, Ben.