TR: mptsas/iommu/pciehp : PCIe hotplug with 2.6.30 and 2.6.31-rc9 with IOMMU enabled.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi.

We are currently having an issue with PCIE hotplug of a LSI SAS1064E embedded controller when VT-d is enabled and the IOMMU driver is loaded. I can't tell yet if it's a fault in the iommu driver code or something else in the platform, but things work smoothly with the iommu disabled.  

When the IOMMU is enabled (intel_iommu=on), the IOC gets in a FAULT state:

I used this to increase the verbosity level:

rmmod mptsas;rmmod mptscsih;rmmod mptbase;modprobe pciehp pciehp_debug=1 ;modprobe mptbase mpt_debug_level=0xFFFFFF;modprobe mptsas


Some details about the platform:

lspci
00:00.0 Host bridge: Intel Corporation 5520/5500/X58 I/O Hub to ESI Port (rev 13)
00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 13)
00:04.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 4 (rev 13)
00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 5 (rev 13)
00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 13)
00:08.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 8 (rev 13)
00:09.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 9 (rev 13)
00:0a.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 10 (rev 13)
00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management Registers (rev 13)
00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 13)
00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 13)
00:14.3 PIC: Intel Corporation 5520/5500/X58 I/O Hub Throttle Registers (rev 13)
00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1
00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2
00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3
00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller
00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller
00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
02:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
02:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
03:00.0 Ethernet controller: Intel Corporation Device 10f7
03:00.1 Ethernet controller: Intel Corporation Device 10f7
04:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
04:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
06:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 08)

And here is some data when the problem occurs.

pt_debug_level=ffffffh
mptbase: ioc1: : mpt_adapter_install
mptsas 0000:06:00.0: enabling device (0000 -> 0002)
mptsas 0000:06:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
mptbase: ioc1: : 32 BIT PCI BUS DMA ADDRESSING SUPPORTED
mptbase: ioc1: mem = ffffc90010570000, mem_phys = d8010000
mptbase: ioc1: facts @ ffff88003d5df41c, pfacts[0] @ ffff88003d5df46c
mptbase: ioc1: Initiating bringup
mptbase: ioc1: MakeIocReady [raw] state=10000000
mptbase: ioc1: IOC is in READY state
  03000000 00000000 00000000
  03140105 00001400 00000000 00000000 00000000 00080022 002001ff 27040000 00000000 00010115 00000000 01700000 00000000 00000807 011b0000 00000100 00000000 00000000 00000000 00000000
  05000000 00000000 00000000
  050a0000 00000000 00000000 00000000 00000000 003f3000 00090070 00700001 00000000 00000000
ioc1: LSISAS1064E B3: Capabilities={Initiator}
mptbase: ioc1: installed at interrupt 16
mptbase: ioc1: PrimeIocFifos
mptbase: ioc1: SendIocInit
  02000004 00017000 00000000
  02050004 00017000 00000000 00000000 00000000
  06000000 00000000 00000000
DRHD: handling fault status reg 2
DMAR:[DMA Read] Request device [06:00.0] fault addr fffc2000
DMAR:[fault reason 06] PTE Read access is not set
mptbase: ioc1: WARNING - Issuing Reset from mpt_config!!
mptbase: ioc1: Initiating recovery
mptbase: ioc1: MakeIocReady [raw] state=40002000
mptbase: ioc1: WARNING - IOC is in FAULT state!!!
mptbase: ioc1: WARNING -            FAULT code = 2000h
mptbase: ioc1: Recovered from IOC FAULT
  03000000 00000000 00000000
  03140105 00001400 00000000 00000000 00000000 00080022 002001ff 27040000 00000000 00010115 00000000 01700000 00000000 00000807 011b0000 00000100 00000000 00000000 00000000 00000000
mptbase: ioc1: PrimeIocFifos
mptbase: ioc1: SendIocInit
  02000004 00017000 00000000
  02050004 00017000 00000000 00000000 00000000
  06000000 00000000 00000000


Here is my current hypothesis:

For some reason, on hotplug reactivation, the resources claimed by the IOC are not the same that were used before and for which the IOMMU has a translation enabled and subsequent DMA access are rejected.

But I'm having a hard time figuring where to look at first, should the resource assigned exactly the same? How is the IOMMU supposed to deal with hot removal of PCI endpoint devices?

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux