Re: [PATCH][RFC] PCI: Workaround to enable poweroff on Mac Pro 11

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 08, 2016 at 07:47:42AM -0500, Bjorn Helgaas wrote:
> On Wed, Jun 08, 2016 at 12:31:27PM +0800, Chen Yu wrote:
> > Hi Bjorn,
> > 
> > On 2016年05月31日 21:16, Bjorn Helgaas wrote:
> > >On Tue, May 31, 2016 at 03:18:02PM +0800, Chen Yu wrote:
> > >>On 2016年05月31日 15:00, Yinghai Lu wrote:
> > >>>On Mon, May 30, 2016 at 8:24 PM, Chen Yu <yu.c.chen@xxxxxxxxx> wrote:
> > >>>
> > >>>>and then in pcibios_assign_resources, 0000:00:1c.0 tries to allocate minimal
> > >>>>resource window and then update related base/limit registers:
> > >>>>
> > >>>>[    0.865342] pci 0000:00:1c.0: bridge window [io  0x1000-0x0fff] to [bus
> > >>>>02] add_size 1000
> > >>>>[    0.865343] pci 0000:00:1c.0: bridge window [mem 0x00100000-0x000fffff
> > >>>>64bit pref] to [bus 02] add_size 200000 add_align 100000
> > >>>>[    0.865344] pci 0000:00:1c.0: bridge window [mem 0x00100000-0x000fffff]
> > >>>>to [bus 02] add_size 200000 add_align 100000
> > >>>>
> > >>>That is for hotplug bridge, then we could use following instead.
> > >>>
> > >>>diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > >>>index ee72ebe..d3ec833 100644
> > >>>--- a/drivers/pci/quirks.c
> > >>>+++ b/drivers/pci/quirks.c
> > >>>@@ -2775,6 +2775,13 @@ static void quirk_hotplug_bridge(struct pci_dev *dev)
> > >>>
> > >>>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HINT, 0x0020, quirk_hotplug_bridge);
> > >>>
> > >>>+static void quirk_hotplug_bridge_skip(struct pci_dev *dev)
> > >>>+{
> > >>>+       dev->is_hotplug_bridge = 0;
> > >>>+}
> > >>>+
> > >>>+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x8c10,
> > >>>quirk_hotplug_bridge_skip);
> > >>>+
> > >>>  /*
> > >>>   * This is a quirk for the Ricoh MMC controller found as a part of
> > >>>   * some mulifunction chips.
> > >>Good idea, but in this way we might not have io window allocated for
> > >>it?I'm not sure
> > >>if this would break wifi,etc, I'll suggest reporters to have a try.
> > >Let's figure out the root cause before trying more random fixes.  I
> > >have the same objection to the patch above.  No doubt there are many
> > >ways we could "fix" this, but unless we know the root cause, we're
> > >likely to make a change that's not a complete fix or will cause other
> > >issues in the future.
> > >
> > I just let the reporter  paste their lspci in Mac OS, it looks that
> > Mac OS also
> > does not allocate any resource for this broken pci bridge, and other pci
> > devices have almost the same resource region as it is in linux, so I think
> > this is an evidence that Apple does not want to use this firmware for now,
> > maybe reserved for future use, declaim resource for this pci bridge might
> > cause unpredictable result, how about adding a dmi+pci quirk for this
> > special platform?
> 
> This is not a root cause.  We still have no idea why the problem
> occurs.  I do not think a quirk is a good idea until we know what the
> problem is and exactly how a quirk would fix it.
> 
> > lspci result from Mac OS, there's no resource behind this bridge:
> > https://bugzilla.kernel.org/attachment.cgi?id=219321

Here are some ideas for debugging this:

1) Apparently the problem is sensitive to programming the prefetchable
   memory aperture of 00:1c.0 to [mem 0x7fc00000-0x7fdfffff 64bit
   pref].  The only effect of that *should* be that the bridge will
   now claim accesses in the aperture, when it didn't before.

   We *think* there's nothing else at the address of that aperture,
   but if there is an unreported device there, it may stop working.  I
   would pore over the E820 memory map, EFI memory map, all ACPI _CRS
   methods, etc., looking for anything in that area that we haven't
   accounted for.

   There are lots of anomalies in this system, e.g., (these are from
   Bastien's dmesg log at
   https://bugzilla.kernel.org/attachment.cgi?id=208961)

     BIOS-e820: [mem 0x00000000e00f8000-0x00000000e00f8fff] reserved
     PCI: MMCONFIG for domain 0000 [bus 00-9b] at [mem 0xe0000000-0xe9bfffff] (base 0xe0000000)
     acpi PNP0A08:00: [Firmware Info]: MMCONFIG for domain 0000 [bus 00-9b] only partially covers this bridge
     pci_bus 0000:00: root bus resource [mem 0x7fa00000-0xfeafffff window]
     system 00:04: [mem 0xe0000000-0xefffffff] could not be reserved
     ACPI Warning: SystemIO range 0x000000000000EFA0-0x000000000000EFBF conflicts with OpRegion 0x000000000000EFA0-0x000000000000EFAF (\_SB.PCI0.SBUS.SMBI)

   The MMCONFIG area appears correctly described in the 00:04 _CRS,
   but incorrectly in MCFG.  The E820 region appears to be a chunk in
   the middle of the MMCONFIG area.  The host bridge window 
   [mem 0x7fa00000-0xfeafffff window] is clearly bogus -- it includes
   the MMCONFIG area, which is definitely not a window.  I doubt
   anything above 0xdfffffff should be included.

   I don't know what the ACPI conflict warning is about, but I'd try
   to figure it out.

   Either the firmware is badly broken, or we're not interpreting
   something correctly.

   I'd try assigning the 00:1c.0 aperture at the end of the actual
   aperture instead of the beginning.  I think Windows, and likely
   MacOS uses a top-down allocation strategy instead of bottom-up like
   Linux does.  If that makes poweroff work, there is likely an
   unreported device in the [mem 0x7fc00000-0x7fdfffff 64bit pref]
   area.

   This experimentation could all be done with setpci, without
   requiring kernel patches.

2) I would explore exactly what the grub "halt" command does and how
   it compares to what Linux is doing.  I see the assertion that this
   is related to [io 0x1804], but I don't know what that's based on.
   Programming the 00:1c.0 prefetchable aperture shouldn't have
   anything to do with I/O ports, so if it really is related, there
   might be some SMM magic or something where SMM code is doing
   something that relates to the memory aperture.

3) The lspci output at https://bugzilla.kernel.org/attachment.cgi?id=219321
   (I think this is from MacOS) shows invalid data for devices
   starting at 04:00.0.  Why?  Maybe this is an unrelated artifact,
   but it doesn't smell right.

4) If you can hot-add devices under MacOS, look to see what address
   space they get assigned.  That may tell you what allocation
   strategy MacOS uses.

5) 00:1c.0 claims to have a slot that supports hotplug.  Is that
   actually true?  Could you add a device below it?  If not, maybe the
   problem is that the BIOS should have configured 00:1c.0 so it
   doesn't report a slot.  If it didn't report a slot, we shouldn't
   assign resources to it, since there is no possibility of a device
   below it.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Kernel Newbies]     [x86 Platform Driver]     [Netdev]     [Linux Wireless]     [Netfilter]     [Bugtraq]     [Linux Filesystems]     [Yosemite Discussion]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux