Re: TR: mptsas/iommu/pciehp : PCIe hotplug with 2.6.30 and 2.6.31-rc9 with IOMMU enabled.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 09, 2009 at 04:13:44PM -0400, Isabelle, Francois wrote:
> Hi.
> 
> We are currently having an issue with PCIE hotplug of a LSI SAS1064E
> embedded controller when VT-d is enabled and the IOMMU driver is loaded.
> I can't tell yet if it's a fault in the iommu driver code or something
> else in the platform, but things work smoothly with the iommu disabled.  

Generally, it's not likely to be a fault of the IOMMU code and more likely
a bug in the driver (not setting up DMA properly). Enabling the IOMMU
just enforces DMA activity actually DMA mappings. This isn't 100% enforcement
due to performance issues, but nearly.

This case might be an exception but needs more investigation.
More questions below that might help lead to the root cause.

> When the IOMMU is enabled (intel_iommu=on), the IOC gets in a FAULT state:
> 
> I used this to increase the verbosity level:
> 
> rmmod mptsas;rmmod mptscsih;rmmod mptbase;modprobe pciehp pciehp_debug=1 ;modprobe mptbase mpt_debug_level=0xFFFFFF;modprobe mptsas
> 
> 
> Some details about the platform:
> 
> lspci
> 00:00.0 Host bridge: Intel Corporation 5520/5500/X58 I/O Hub to ESI Port (rev 13)
> 00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 13)
> 00:04.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 4 (rev 13)
> 00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 5 (rev 13)
> 00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 13)
> 00:08.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 8 (rev 13)
> 00:09.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 9 (rev 13)
> 00:0a.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 10 (rev 13)
> 00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management Registers (rev 13)
> 00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 13)
> 00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 13)
> 00:14.3 PIC: Intel Corporation 5520/5500/X58 I/O Hub Throttle Registers (rev 13)
> 00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
> 00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
> 00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
> 00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
> 00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
> 00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
> 00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
> 00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
> 00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1
> 00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2
> 00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3
> 00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1
> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
> 00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller
> 00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller
> 00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
> 02:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 02:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 03:00.0 Ethernet controller: Intel Corporation Device 10f7
> 03:00.1 Ethernet controller: Intel Corporation Device 10f7
> 04:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 04:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 06:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 08)
> 
> And here is some data when the problem occurs.

Really need to entire console output from boot. ACPI spews info about
the IOMMU resources that are relevant to debugging it.

> pt_debug_level=ffffffh
> mptbase: ioc1: : mpt_adapter_install
> mptsas 0000:06:00.0: enabling device (0000 -> 0002)
> mptsas 0000:06:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> mptbase: ioc1: : 32 BIT PCI BUS DMA ADDRESSING SUPPORTED
> mptbase: ioc1: mem = ffffc90010570000, mem_phys = d8010000
> mptbase: ioc1: facts @ ffff88003d5df41c, pfacts[0] @ ffff88003d5df46c
> mptbase: ioc1: Initiating bringup
> mptbase: ioc1: MakeIocReady [raw] state=10000000
> mptbase: ioc1: IOC is in READY state
>   03000000 00000000 00000000
>   03140105 00001400 00000000 00000000 00000000 00080022 002001ff 27040000 00000000 00010115 00000000 01700000 00000000 00000807 011b0000 00000100 00000000 00000000 00000000 00000000
>   05000000 00000000 00000000
>   050a0000 00000000 00000000 00000000 00000000 003f3000 00090070 00700001 00000000 00000000
> ioc1: LSISAS1064E B3: Capabilities={Initiator}
> mptbase: ioc1: installed at interrupt 16
> mptbase: ioc1: PrimeIocFifos

PrimeIocFifos calls pci_alloc_consistent() and has debug code to dump
the DMA resource allocated (both virtual and DMA addresses). Off hand
I don't know how to enable that but it would be the next step.

This code is broken in that it's calling pci_alloc_consistent()
before calling pci_set_consistent_dma_mask(). This is almost certain
to cause problems if ioc->dma_mask is not DMA_BIT_MASK(32).
Move the pci_set_dma*mask() calls to the beginning of the function.


> mptbase: ioc1: SendIocInit
>   02000004 00017000 00000000
>   02050004 00017000 00000000 00000000 00000000
>   06000000 00000000 00000000
> DRHD: handling fault status reg 2
> DMAR:[DMA Read] Request device [06:00.0] fault addr fffc2000

fffc2000 seems to be an unusual address to DMA from/to.
Is fffc2000 reserved address space for the IOMMU?
(ACPI DMAR info should tell us this)

> DMAR:[fault reason 06] PTE Read access is not set

It's also odd that "Read Access is not set" for something (ioc_init)
that I think should be bi-directional. Need to track down the code in
MPT driver which prepares the DMA activity for SendIocInit and
compare to the PTE's access rights.

Looking at Documentation/Intel-IOMMU.txt, fed9x000 seems to be
the base address of the IOMMU page table. But I don't know which
PCI address range is reserved for the IOMMU to decode.
(Someone from Intel can probably tell based on chipset)

hth,
grant

> mptbase: ioc1: WARNING - Issuing Reset from mpt_config!!
> mptbase: ioc1: Initiating recovery
> mptbase: ioc1: MakeIocReady [raw] state=40002000
> mptbase: ioc1: WARNING - IOC is in FAULT state!!!
> mptbase: ioc1: WARNING -            FAULT code = 2000h
> mptbase: ioc1: Recovered from IOC FAULT
>   03000000 00000000 00000000
>   03140105 00001400 00000000 00000000 00000000 00080022 002001ff 27040000 00000000 00010115 00000000 01700000 00000000 00000807 011b0000 00000100 00000000 00000000 00000000 00000000
> mptbase: ioc1: PrimeIocFifos
> mptbase: ioc1: SendIocInit
>   02000004 00017000 00000000
>   02050004 00017000 00000000 00000000 00000000
>   06000000 00000000 00000000
> 
> 
> Here is my current hypothesis:
> 
> For some reason, on hotplug reactivation, the resources claimed by the IOC are not the same that were used before and for which the IOMMU has a translation enabled and subsequent DMA access are rejected.
> 
> But I'm having a hard time figuring where to look at first, should the resource assigned exactly the same? How is the IOMMU supposed to deal with hot removal of PCI endpoint devices?
> 
> Thank you.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux