On Mon, Apr 23, 2018 at 01:10:44PM -0600, Alex Williamson wrote: > On Mon, 23 Apr 2018 13:28:22 -0400 > Sinan Kaya <okaya@xxxxxxxxxxxxxx> wrote: > > > On 4/20/2018 11:04 AM, Alex Williamson wrote: > > > Is there a concern here about whether the endpoint device driver or the > > > PCI core really knows better about link retraining? This makes me > > > remember my unfinished (and need to revisit) Pericom quirk[1] where > > > errata in the PCIe switch requires that upstream and downstream links > > > are balanced (ie. same link rate) or else enabling ACS results in > > > packets not properly flowing through the switch. If an endpoint driver > > > starts deciding to retrain links, overriding quirks in the PCI core, > > > then such topology manipulation isn't possible. Why does the > > > driver .probe() function think it can retrain at a higher link rate > > > than PCI core? Thanks, > > > > The example given is for some serdes firmware load to happen in probe and > > then performing a retrain to reach to a better speed. > > > > It becomes a chicken and egg problem. > > > > 1. Endpoint HW trains to gen1 by default pre-boot. > > 2. PCI core enumerates the device. > > 3. Endpoint driver gets loaded > > 4. Driver does the firmware programming followed by a link retrain. > > > > I think it is the responsibility of the PCI core to provide reset APIs. > > However, expecting endpoint drivers to be knowledgeable about hotplug is > > too much. > > > > We can certainly contain AER change into pci directory by moving the slot > > reset function to drivers/pci.h file. > > > > But, we need to think about what to do about VFIO and other endpoint > > initiated reset cases. My suggestion was to move this into a single API and > > remove all other APIs from include/linux/pci.h. > > I'm a little confused about the relation between reset and retrain. > AIUI we can retrain the link without any sort of endpoint reset and if > some sort of driver/firmware setup is required on the endpoint to > achieve the target link speed, then I'd think we only want to retrain. In hfi1, do_pcie_gen3_transition() resets the device. I don't know if retraining the link would be sufficient; maybe the reset is required to make the device use the new firmware. I guess we already export reset interfaces, so if we add a retrain interface, drivers could choose what they need. > How this is going to work with vfio is an interesting question. We're > only providing access to the device, not the link to the device. > Multifunction endpoints become a big problem if one function starts > requesting link retraining while another is in use elsewhere. Can we just make it the driver's problem by returning -EPERM if one function requests a retrain while another function is in use? Bjorn