Re: [PATCH] PCI: Workaround for Intel MPS errata

Jon Mason <mason@xxxxxxxx> · Wed, 5 Oct 2011 12:06:18 -0500

On Wed, Oct 5, 2011 at 11:37 AM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote:
> On Wed, Oct 5, 2011 at 10:17 AM, Jon Mason <mason@xxxxxxxx> wrote:
>> Intel 5000 and 5100 series memory controllers have a known issue if read
>> completion coalescing is enabled and the PCI-E Maximum Payload Size is
>> set to 256B.  To work around this issue, disable read completion
>> coalescing in the memory controller and root complexes.  Unfortunately,
>> it must always be disabled, even if no 256B MPS devices are precent, due
>
> "present"
>
>> to the possiblity of one being hotplugged.
>
> "possibility"

One of these days I'll learn to always run spell check on these.  Thanks.

>
>> Links to erratas:
>> http://www.intel.com/content/dam/doc/specification-update/5000-chipset-memory-controller-hub-specification-update.pdf
>> http://www.intel.com/content/dam/doc/specification-update/5100-memory-controller-hub-chipset-specification-update.pdf
>>
>> Thanks to Jesse Brandeburg and Ben Hutchings for providing insight into
>> the problem.
>>
>> Tested-and-Reported-by: Avi Kivity <avi@xxxxxxxxxx>
>> Signed-off-by: Jon Mason <mason@xxxxxxxx>
>> ---
>>  drivers/pci/quirks.c |   69 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 69 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index 1196f61..70da3f5 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -2822,6 +2822,75 @@ static void __devinit fixup_ti816x_class(struct pci_dev* dev)
>>  }
>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_TI, 0xb800, fixup_ti816x_class);
>>
>> +/* Intel 5000 and 5100 Memory controllers have an errata with read completion
>> + * coalescing (which is enabled by default on some BIOSes) and MPS of 256B.
>> + * Since there is no way of knowing what the PCIE MPS on each fabric will be
>> + * until all of the devices are discovered and buses walked, read completion
>> + * coalescing must be disabled.  Unfortunately, it cannot be re-enabled because
>> + * it is possible to hotplug a device with MPS of 256B.
>> + */
>> +static void __devinit quirk_intel_mc_errata(struct pci_dev *dev)
>> +{
>> +       int err;
>> +       u16 rcc;
>> +
>> +       if (pcie_bus_config == PCIE_BUS_TUNE_OFF)
>> +               return;
>> +
>> +       /* Intel errata specifies bits to change but does not say what they are.
>> +        * Keeping them magical until such time as the registers and values can
>> +        * be explained.
>> +        */
>> +       err = pci_read_config_word(dev, 0x48, &rcc);
>> +       if (err) {
>> +               dev_err(&dev->dev, "Error attempting to read the read "
>> +                       "completion coalescing register.\n");
>> +               return;
>> +       }
>> +
>> +       if (!(rcc & (1 << 10)))
>> +               return;
>> +
>> +       rcc &= ~(1 << 10);
>> +
>> +       err = pci_write_config_word(dev, 0x48, rcc);
>> +       if (err) {
>> +               dev_err(&dev->dev, "Error attempting to write the read "
>> +                       "completion coalescing register.\n");
>> +               return;
>> +       }
>> +
>> +       pr_info_once("Read completion coalescing disabled due to hardware "
>> +                    "errata relating to 256B MPS.\n");
>
> Can you use dev_info() here so we have the context that (a) this is
> PCI-related and (b) specifically related to device X?

It would print out 6 times for each system (one for the memory
controller and one for each root port).  I thought printing it once
was better, since it is really a system wide setting.

>
>> +}
>> +/* Intel 5000 series memory controllers and ports 2-7 */
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25c0, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25d0, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25d4, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25d8, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25e2, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25e3, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25e4, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25e5, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25e6, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25e7, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25f7, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25f8, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25f9, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x25fa, quirk_intel_mc_errata);
>> +/* Intel 5100 series memory controllers and ports 2-7 */
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65c0, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65e2, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65e3, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65e4, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65e5, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65e6, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65e7, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65f7, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65f8, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65f9, quirk_intel_mc_errata);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65fa, quirk_intel_mc_errata);
>
> Wow, that's a long list.  Makes me suspect that we'll have to keep
> adding things to it, but I guess there's no good way to do it
> generically.

I looked at a number of other Intel MC erratas, and didn't see any MPS
related issues.  Hopefully this is it for these, though who knows if
there are similar issues on AMD/nVidia memory memory controllers.
Guess there is one way to find out. :)

Should I push this for 3.1-rcX or wait for 3.2?

>
> Bjorn
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html