Hi Bjorn, > On Nov 21, 2016, at 4:05 PM, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > Hi Matthew, > > On Mon, Nov 21, 2016 at 03:09:49PM -0600, Matthew R. Ochs wrote: >> The PCI core uses a fixed 50ms timeout when waiting for VPD accesses to >> complete. When an access does not complete within this period, a warning >> is logged and an error returned to the caller. >> >> While this default timeout is valid for most hardware, some devices can >> experience longer access delays under certain circumstances. For example, >> one of the IBM CXL Flash devices can take up to ~120ms in a worst-case >> scenario. These types of devices can benefit from an extended timeout that >> is specific to their hardware constraints. >> >> To support per-device VPD access timeouts, pci_set_vpd_timeout() is added >> as an exported service. PCI devices will continue to default with the 50ms >> timeout and use a per-device timeout when a driver calls this new service. > > Can you include a pointer to something in the spec that's behind the > default 50ms timeout, or did somebody just pull that number out of the > air? AFAIK the PCI spec is silent on VPD access timeouts. The current 50ms timeout can be traced to Commit 1120f8b8169f ("PCI: handle long delays in VPD access") where the timeout was increased to accommodate specific hardware. Prior to that the wait timeout was dependent upon a read or write with the write waiting up to a maximum of 10ms. > I'm wondering how we know 50ms or 120ms or 250ms or whatever is the > right number. What bad things would happen if we just increased the > timeout from 50 to 125ms for *all* devices? You're asking the right questions. The timeout chosen for the CXL flash device was derived through instrumentation, and this was only after witnessing VPD timeout messages in the kernel log at random times. I originally thought about proposing a blanket increase, but figured the scope might be too broad. There are 2 downsides I see with simply replacing 50ms to a larger value: - Raising the timeout bar [potentially] raises the total time it takes to complete a VPD access. One would hope that scenarios where every access times out are very rare and that the max limits are only rarely encountered. - It's difficult to settle on a single 'catch all' value. What might be fine for h/w A may end up not working for h/w B (as is the case here). That said, given that 50ms has served as the value for roughly nine years I think this point doesn't carry much weight. > > I don't really want to end up with a bunch of device-specific quirks > here. If we have a quirk to work around one defective device, that's > one thing. If the spec allows a huge variation in VPD access time, > that might be something we want to handle I agree 100% and would be more than happy with submitting a patch that simply increases the value. -matt -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html