Re: [PATCH 1/3] PCI: Make PCIE_RESET_READY_POLL_MS configurable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 24, 2020 at 05:52:14PM +0000, Spassov, Stanislav wrote:
> On Mon, 2020-02-24 at 08:15 -0600, Bjorn Helgaas wrote:
> > On Sun, Feb 23, 2020 at 01:20:55PM +0100, Stanislav Spassov wrote:
> > > From: Wei Wang <wawei@xxxxxxxxx>
> > > 
> > > The reasonable value for the maximum time to wait for a PCI
> > > device to be ready after reset varies depending on the platform
> > > and the reliability of its set of devices.
> > 
> > There are several mechanisms in the spec for reducing these times,
> > e.g., Readiness Notifications (PCIe r5.0, sec 6.23), the Readiness
> > Time Reporting capability (sec 7.9.17), and ACPI _DSM methods (PCI
> > Firmware Spec r3.2, sec 4.6.8, 4.6.9).
> > 
> > I would much rather use standard mechanisms like those instead of
> > adding kernel parameters.  A user would have to use trial and
> > error to figure out the value to use with a parameter like this,
> > and that doesn't feel like a robust approach.
> 
> I agree that supporting the standard mechanisms is highly desirable,
> but some sort of fallback poll timeout value is necessary on
> platforms where none of those mechanisms are supported. Arguably,
> some kernel configurability (whether via a kernel parameter, as
> proposed here, or some other means) is still desirable.

IIUC we basically have two issues: 1) the default 60 second timeout is
too long, and 2) you'd like to reduce the delays further à la the
Device Readiness _DSM even for firmware that doesn't support that.

The 60 second timeout came from 821cdad5c46c ("PCI: Wait up to 60
seconds for device to become ready after FLR") and is probably too
long.  We probably should pick a smaller value based on numbers from
the spec and make quirks for devices that needed more time.

As far as reducing delays for specific devices, if we can identify
them via Vendor/Device ID, can we make per-device values that could be
set either by the _DSM or by a quirk?

I'm trying to wriggle out of adding yet more PCI kernel parameters
because people tend to stumble across them and pass them around on
bulletin boards as ways to "fix" or "speed up" things that really
should be addressed in better ways.

> I also agree there is no robust method to determine a "good value", but
> then again - how was the current value for the constant determined? As
> suggested in PATCH 2, the idea is to lower the global timeout to avoid
> hung tasks when devices break and never come back after reset.

I don't remember exactly how we came up with 60 seconds; I suspect it
was just a convenient large number.

Bjorn



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux