Re: Memory allocation bug in pci hotplug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2009-11-18 at 21:03 +0000, Eric Biederman wrote:
> On Wed, Nov 18, 2009 at 11:39 AM, Patrick Keller <patrick.keller@xxxxxx> wrote:
> > I guess I'm now confused, having seen a working log where after being
> > hotplugged the card in the slot comes back up and the non-working log
> > where everything is broken, the regressed broken state is better?
> 
> I may be misreading something but I see a
> ``working'' configuration where no memory address are assign
> to the cards, and a ``non-working'' configuration where at least
> prefetchable memory is assigned.
> 
> What I don't see in the ``working'' configuration is anything plugged
> into the hotplug slots.  Nothing in bus numbers > 18.

You're correct, we're actually operating on a non-hotplug slot. In the
working case the driver is able to claim the card, and in the
non-working case the modprobe fails.

> The failure mode as I see it is actually attempting to hotplug a card
> into those hotplug slots that don't have any resources that
> won't work in either configuration but at least in the second configuration
> you get an error telling you something that is wrong.
> 
> What card in the slot are you talking about that comes back up in one and
> fails in the other?  All I see is simulated hotplug on non-hotplug slots.

The test case is exercising the logical hotplug aka. fakephp.  Fakephp
and the pci core use the exact same hotplug code.  

Logical hotplug should allow us to remove and readd a device whether it
resides in a hotplug slot or not.  I agree that we are not talking about
physical hotplug.


Just as a reminder, we're doing a logical remove at the level of
0000:0b:00.0 and rescanning the entire pci bus, the error we're seeing
is that the QLogic device isn't coming back.

here's the lspci -tv
+-06.0-[0000:0b-15]----00.0-[0000:0c-15]--+-00.0-[0000:13-15]--+-00.0  QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA
|                                         |                    \-00.1  QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA
|                                         +-01.0-[0000:10-12]--
|                                         \-09.0-[0000:0d-0f]--


here's the remove

dl580g5:~# echo 1 > /sys/bus/pci/devices/0000\:0b\:00.0/remove
[  279.175405] qla2xxx 0000:13:00.0: PCI INT A disabled
[  279.195644] qla2xxx 0000:13:00.1: PCI INT B disabled
[  279.203038] pcieport-driver 0000:0c:00.0: PCI INT A disabled
[  279.210565] pcieport-driver 0000:0c:01.0: PCI INT A disabled
[  279.213557] pcieport-driver 0000:0c:09.0: PCI INT A disabled
[  279.219402] pcieport-driver 0000:0b:00.0: PCI INT A disabled

here's the successful add: note how the driver is able to claim the card
successfully

dl580g5:~# echo 1 > /sys/bus/pci/rescan
[  289.280825] pci 0000:0c:00.0: PCI bridge, secondary bus 0000:13
[  289.288490] pci 0000:0c:00.0:   IO window: 0x5000-0x5fff
[  289.292747] pci 0000:0c:00.0:   MEM window: 0xfdd00000-0xfddfffff
[  289.302746] pci 0000:0c:00.0:   PREFETCH window:
0xd0200000-0xd02fffff
[  289.310911] pci 0000:0c:01.0: PCI bridge, secondary bus 0000:10
[  289.320406] pci 0000:0c:01.0:   IO window: disabled
[  289.330103] pci 0000:0c:01.0:   MEM window: disabled
[  289.333599] pci 0000:0c:01.0:   PREFETCH window: disabled
[  289.336802] pci 0000:0c:09.0: PCI bridge, secondary bus 0000:0d
[  289.344602] pci 0000:0c:09.0:   IO window: disabled
[  289.351192] pci 0000:0c:09.0:   MEM window: disabled
[  289.357376] pci 0000:0c:09.0:   PREFETCH window: disabled
[  289.363505] pci 0000:0b:00.0: PCI bridge, secondary bus 0000:0c
[  289.370990] pci 0000:0b:00.0:   IO window: 0x5000-0x5fff
[  289.373718] pci 0000:0b:00.0:   MEM window: 0xfdd00000-0xfddfffff
[  289.376754] pci 0000:0b:00.0:   PREFETCH window:
0xd0200000-0xd02fffff
[  289.380106] pci 0000:0b:00.0: PCI INT A -> GSI 28 (level, low) -> IRQ
28
[  289.390058] pci 0000:0c:00.0: PCI INT A -> GSI 28 (level, low) -> IRQ
28
[  289.395323] pci 0000:0c:01.0: PCI INT A -> GSI 29 (level, low) -> IRQ
29
[  289.402226] pci 0000:0c:09.0: PCI INT A -> GSI 29 (level, low) -> IRQ
29
[  289.406913] qla2xxx 0000:13:00.0: PCI INT A -> GSI 28 (level, low) ->
IRQ 28
[  289.415366] qla2xxx 0000:13:00.0: Found an ISP2432, irq 28, iobase
0xffffc90014a74000
[  289.421343] qla2xxx 0000:13:00.0: Configuring PCI space...
[  289.457374] qla2xxx 0000:13:00.0: Configure NVRAM parameters...
[  289.497073] qla2xxx 0000:13:00.0: Verifying loaded RISC code...
[  289.516422] qla2xxx 0000:13:00.0: FW: Loading via request-firmware...
[  289.748425] qla2xxx 0000:13:00.0: Allocated (64 KB) for EFT...
[  289.753977] qla2xxx 0000:13:00.0: Allocated (1285 KB) for firmware
dump...
[  289.772470] scsi4 : qla2xxx
[  289.775004] qla2xxx 0000:13:00.0: 
[  289.775005]  QLogic Fibre Channel HBA Driver: 8.03.01-k4
[  289.775007]   QLogic HPAE312A - PCI-Express Dual Port 4Gb Fibre
Channel HBA
[  289.775008]   ISP2432: PCIe (2.5GT/s x4) @ 0000:13:00.0 hdma+,
host#=4, fw=4.04.00 (482)
[  289.809603] qla2xxx 0000:13:00.1: PCI INT B -> GSI 29 (level, low) ->
IRQ 29
[  289.815443] qla2xxx 0000:13:00.1: Found an ISP2432, irq 29, iobase
0xffffc90014bbc000
[  289.819622] qla2xxx 0000:13:00.1: Configuring PCI space...
[  289.856379] qla2xxx 0000:13:00.1: Configure NVRAM parameters...
[  289.898584] qla2xxx 0000:13:00.1: Verifying loaded RISC code...
[  289.920050] qla2xxx 0000:13:00.1: FW: Loading via request-firmware...
[  290.039841] qla2xxx 0000:13:00.0: LIP reset occurred (f700).
[  290.084751] qla2xxx 0000:13:00.0: LOOP UP detected (4 Gbps).
[  290.156045] qla2xxx 0000:13:00.1: Allocated (64 KB) for EFT...
[  290.160094] qla2xxx 0000:13:00.1: Allocated (1285 KB) for firmware
dump...
[  290.180082] scsi5 : qla2xxx
[  290.187344] qla2xxx 0000:13:00.1: 
[  290.187345]  QLogic Fibre Channel HBA Driver: 8.03.01-k4
[  290.187346]   QLogic HPAE312A - PCI-Express Dual Port 4Gb Fibre
Channel HBA
[  290.187348]   ISP2432: PCIe (2.5GT/s x4) @ 0000:13:00.1 hdma+,
host#=5, fw=4.04.00 (482)

here's the bad case: I see what you're saying about the prefetch windows
getting set, but why is the probe failing where it wasn't before?

dl580g5:~# echo 1 > /sys/bus/pci/rescan

[   60.481211] pcieport-driver 0000:17:00.0: BAR 8: can't allocate mem
resource [0xfe000000-0xfdffffff]
[   60.486438] pcieport-driver 0000:18:01.0: BAR 8: can't allocate mem
resource [0x100000-0x4fffff]
[   60.492554] pcieport-driver 0000:18:08.0: BAR 8: can't allocate mem
resource [0x100000-0x4fffff]
[   60.498928] pcieport-driver 0000:18:09.0: BAR 8: can't allocate mem
resource [0x100000-0x4fffff]
[   60.505749] pci 0000:0b:00.0: BAR 8: can't allocate mem resource
[0xfdd00000-0xfdefffff]
[   60.511608] pci 0000:0b:00.0: BAR 7: can't allocate I/O resource
[0x5000-0x5fff]
[   60.515337] pci 0000:0c:00.0: BAR 8: can't allocate mem resource
[0x100000-0x1fffff]
[   60.522030] pci 0000:0c:01.0: BAR 8: can't allocate mem resource
[0x100000-0x2fffff]
[   60.527686] pci 0000:0c:09.0: BAR 8: can't allocate mem resource
[0x100000-0x2fffff]
[   60.531453] pci 0000:0c:00.0: BAR 7: can't allocate I/O resource
[0x1000-0x1fff]
[   60.546448] pci 0000:0c:01.0: BAR 7: can't allocate I/O resource
[0x1000-0x1fff]
[   60.551934] pci 0000:0c:09.0: BAR 7: can't allocate I/O resource
[0x1000-0x1fff]
[   60.564610] pci 0000:13:00.0: BAR 1: can't allocate mem resource
[0xfdef0000-0xfdef3fff]
[   60.569948] pci 0000:13:00.1: BAR 1: can't allocate mem resource
[0xfdee0000-0xfdee3fff]
[   60.578430] pci 0000:13:00.0: BAR 0: can't allocate I/O resource
[0x5000-0x50ff]
[   60.591485] pci 0000:13:00.1: BAR 0: can't allocate I/O resource
[0x5400-0x54ff]
[   60.596247] pci 0000:0c:00.0: PCI bridge, secondary bus 0000:13
[   60.599421] pci 0000:0c:00.0:   IO window: disabled
[   60.601927] pci 0000:0c:00.0:   MEM window: disabled
[   60.606043] pci 0000:0c:00.0:   PREFETCH window:
0xd0800000-0xd08fffff
[   60.613557] pci 0000:0c:01.0: PCI bridge, secondary bus 0000:10
[   60.617539] pci 0000:0c:01.0:   IO window: disabled
[   60.621011] pci 0000:0c:01.0:   MEM window: disabled
[   60.624132] pci 0000:0c:01.0:   PREFETCH window:
0x000000d0900000-0x000000d0afffff
[   60.628719] pci 0000:0c:09.0: PCI bridge, secondary bus 0000:0d
[   60.635274] pci 0000:0c:09.0:   IO window: disabled
[   60.637744] pci 0000:0c:09.0:   MEM window: disabled
[   60.645612] pci 0000:0c:09.0:   PREFETCH window:
0x000000d0b00000-0x000000d0cfffff
[   60.653157] pci 0000:0b:00.0: PCI bridge, secondary bus 0000:0c
[   60.659374] pci 0000:0b:00.0:   IO window: disabled
[   60.662084] pci 0000:0b:00.0:   MEM window: disabled
[   60.665175] pci 0000:0b:00.0:   PREFETCH window:
0xd0800000-0xd0cfffff
[   60.670665] pci 0000:0b:00.0: PCI INT A -> GSI 28 (level, low) -> IRQ
28
[   60.676783] pci 0000:0c:00.0: PCI INT A -> GSI 28 (level, low) -> IRQ
28
[   60.681139] pci 0000:0c:01.0: PCI INT A -> GSI 29 (level, low) -> IRQ
29
[   60.686382] pci 0000:0c:09.0: PCI INT A -> GSI 29 (level, low) -> IRQ
29
[   60.692158] qla2xxx 0000:13:00.0: PCI INT A -> GSI 28 (level, low) ->
IRQ 28
[   60.698829] qla2xxx 0000:13:00.0: region #1 not an MMIO resource
(0000:13:00.0), aborting
[   60.704788] qla2xxx 0000:13:00.0: PCI INT A disabled
[   60.708125] qla2xxx: probe of 0000:13:00.0 failed with error -12
[   60.716643] qla2xxx 0000:13:00.1: PCI INT B -> GSI 29 (level, low) ->
IRQ 29
[   60.722056] qla2xxx 0000:13:00.1: region #1 not an MMIO resource
(0000:13:00.1), aborting
[   60.729427] qla2xxx 0000:13:00.1: PCI INT B disabled
[   60.732098] qla2xxx: probe of 0000:13:00.1 failed with error -12

Here's a diff of two boots, goodboot.txt is without your patch and
badboot.txt is with your patch.  

pkeller@pLaptop:~$ diff -Nurp goodboot.txt badboot.txt 
--- goodboot.txt	2009-11-18 17:03:28.000000000 -0700
+++ badboot.txt	2009-11-18 17:06:04.000000000 -0700
@@ -38,58 +38,66 @@ pci 0000:00:03.0: PCI bridge, secondary 
 pci 0000:00:03.0:   IO window: disabled
 pci 0000:00:03.0:   MEM window: disabled
 pci 0000:00:03.0:   PREFETCH window: disabled
+pci 0000:17:00.0: BAR 8: can't allocate mem resource [0xfe000000-0xfdffffff]
+pci 0000:18:01.0: BAR 8: can't allocate mem resource [0x100000-0x2fffff]
+pci 0000:18:08.0: BAR 8: can't allocate mem resource [0x100000-0x2fffff]
+pci 0000:18:09.0: BAR 8: can't allocate mem resource [0x100000-0x2fffff]
 pci 0000:18:01.0: PCI bridge, secondary bus 0000:25
-pci 0000:18:01.0:   IO window: disabled
+pci 0000:18:01.0:   IO window: 0x6000-0x6fff
 pci 0000:18:01.0:   MEM window: disabled
-pci 0000:18:01.0:   PREFETCH window: disabled
+pci 0000:18:01.0:   PREFETCH window: 0x000000d0200000-0x000000d03fffff
 pci 0000:18:02.0: PCI bridge, secondary bus 0000:22
 pci 0000:18:02.0:   IO window: disabled
 pci 0000:18:02.0:   MEM window: disabled
 pci 0000:18:02.0:   PREFETCH window: disabled
 pci 0000:18:08.0: PCI bridge, secondary bus 0000:19
-pci 0000:18:08.0:   IO window: disabled
+pci 0000:18:08.0:   IO window: 0x7000-0x7fff
 pci 0000:18:08.0:   MEM window: disabled
-pci 0000:18:08.0:   PREFETCH window: disabled
+pci 0000:18:08.0:   PREFETCH window: 0x000000d0400000-0x000000d05fffff
 pci 0000:18:09.0: PCI bridge, secondary bus 0000:1c
-pci 0000:18:09.0:   IO window: disabled
+pci 0000:18:09.0:   IO window: 0x8000-0x8fff
 pci 0000:18:09.0:   MEM window: disabled
-pci 0000:18:09.0:   PREFETCH window: disabled
+pci 0000:18:09.0:   PREFETCH window: 0x000000d0600000-0x000000d07fffff
 pci 0000:18:0a.0: PCI bridge, secondary bus 0000:1f
 pci 0000:18:0a.0:   IO window: disabled
 pci 0000:18:0a.0:   MEM window: disabled
 pci 0000:18:0a.0:   PREFETCH window: disabled
 pci 0000:17:00.0: PCI bridge, secondary bus 0000:18
-pci 0000:17:00.0:   IO window: disabled
+pci 0000:17:00.0:   IO window: 0x6000-0x8fff
 pci 0000:17:00.0:   MEM window: disabled
-pci 0000:17:00.0:   PREFETCH window: disabled
+pci 0000:17:00.0:   PREFETCH window: 0x000000d0200000-0x000000d07fffff
 pci 0000:00:04.0: PCI bridge, secondary bus 0000:17
-pci 0000:00:04.0:   IO window: disabled
+pci 0000:00:04.0:   IO window: 0x6000-0x8fff
 pci 0000:00:04.0:   MEM window: 0xfdf00000-0xfdffffff
-pci 0000:00:04.0:   PREFETCH window: disabled
+pci 0000:00:04.0:   PREFETCH window: 0x000000d0200000-0x000000d07fffff
 pci 0000:00:05.0: PCI bridge, secondary bus 0000:28
 pci 0000:00:05.0:   IO window: disabled
 pci 0000:00:05.0:   MEM window: disabled
 pci 0000:00:05.0:   PREFETCH window: disabled
+pci 0000:0c:01.0: BAR 8: can't allocate mem resource [0xfdf00000-0xfdefffff]
+pci 0000:0c:09.0: BAR 8: can't allocate mem resource [0xfdf00000-0xfdefffff]
+pci 0000:0c:01.0: BAR 7: can't allocate I/O resource [0x6000-0x5fff]
+pci 0000:0c:09.0: BAR 7: can't allocate I/O resource [0x6000-0x5fff]
 pci 0000:0c:00.0: PCI bridge, secondary bus 0000:13
 pci 0000:0c:00.0:   IO window: 0x5000-0x5fff
 pci 0000:0c:00.0:   MEM window: 0xfde00000-0xfdefffff
-pci 0000:0c:00.0:   PREFETCH window: 0xd0200000-0xd02fffff
+pci 0000:0c:00.0:   PREFETCH window: 0xd0800000-0xd08fffff
 pci 0000:0c:01.0: PCI bridge, secondary bus 0000:10
 pci 0000:0c:01.0:   IO window: disabled
 pci 0000:0c:01.0:   MEM window: disabled
-pci 0000:0c:01.0:   PREFETCH window: disabled
+pci 0000:0c:01.0:   PREFETCH window: 0x000000d0900000-0x000000d0afffff
 pci 0000:0c:09.0: PCI bridge, secondary bus 0000:0d
 pci 0000:0c:09.0:   IO window: disabled
 pci 0000:0c:09.0:   MEM window: disabled
-pci 0000:0c:09.0:   PREFETCH window: disabled
+pci 0000:0c:09.0:   PREFETCH window: 0x000000d0b00000-0x000000d0cfffff
 pci 0000:0b:00.0: PCI bridge, secondary bus 0000:0c
 pci 0000:0b:00.0:   IO window: 0x5000-0x5fff
 pci 0000:0b:00.0:   MEM window: 0xfde00000-0xfdefffff
-pci 0000:0b:00.0:   PREFETCH window: 0xd0200000-0xd02fffff
+pci 0000:0b:00.0:   PREFETCH window: 0xd0800000-0xd0cfffff
 pci 0000:00:06.0: PCI bridge, secondary bus 0000:0b
 pci 0000:00:06.0:   IO window: 0x5000-0x5fff
 pci 0000:00:06.0:   MEM window: 0xfdd00000-0xfdefffff
-pci 0000:00:06.0:   PREFETCH window: 0xd0200000-0xd02fffff
+pci 0000:00:06.0:   PREFETCH window: 0xd0800000-0xd0cfffff
 pci 0000:00:07.0: PCI bridge, secondary bus 0000:16
 pci 0000:00:07.0:   IO window: disabled
 pci 0000:00:07.0:   MEM window: disabled
@@ -97,7 +105,7 @@ pci 0000:00:07.0:   PREFETCH window: dis
 pci 0000:00:1c.0: PCI bridge, secondary bus 0000:02
 pci 0000:00:1c.0:   IO window: 0x4000-0x4fff
 pci 0000:00:1c.0:   MEM window: 0xf7d00000-0xf7efffff
-pci 0000:00:1c.0:   PREFETCH window: 0xd0300000-0xd03fffff
+pci 0000:00:1c.0:   PREFETCH window: 0xd0d00000-0xd0dfffff
 pci 0000:00:1e.0: PCI bridge, secondary bus 0000:01
 pci 0000:00:1e.0:   IO window: 0x2000-0x3fff
 pci 0000:00:1e.0:   MEM window: 0xf7b00000-0xf7cfffff

Here's the /proc/iomem output from both the good boot without your
patch, and the bad boot with your patch.

dl580g5:~# diff -Nurp goodiomem.txt badiomem.txt
--- goodiomem.txt	2009-11-18 17:14:10.000000000 -0700
+++ badiomem.txt	2009-11-18 17:10:31.000000000 -0700
@@ -2,8 +2,8 @@
 0009f400-0009ffff : reserved
 000f0000-000fffff : reserved
 00100000-cfd42fff : System RAM
-  01000000-012a1da6 : Kernel code
-  012a1da7-0143cb5f : Kernel data
+  01000000-012a1e46 : Kernel code
+  012a1e47-0143cb5f : Kernel data
   014d2000-015991bb : Kernel bss
 cfd43000-cfd4bfff : ACPI Tables
 cfd4c000-cfd4cfff : System RAM
@@ -16,13 +16,20 @@ d0000000-d01fffff : PCI Bus 0000:03
     d0100000-d01fffff : PCI Bus 0000:07
       d0100000-d01fffff : PCI Bus 0000:08
         d0100000-d011ffff : 0000:08:00.0
-d0200000-d02fffff : PCI Bus 0000:0b
-  d0200000-d02fffff : PCI Bus 0000:0c
-    d0200000-d02fffff : PCI Bus 0000:13
-      d0200000-d023ffff : 0000:13:00.0
-      d0240000-d027ffff : 0000:13:00.1
-d0300000-d03fffff : PCI Bus 0000:02
-  d0300000-d033ffff : 0000:02:00.0
+d0200000-d07fffff : PCI Bus 0000:17
+  d0200000-d07fffff : PCI Bus 0000:18
+    d0200000-d03fffff : PCI Bus 0000:25
+    d0400000-d05fffff : PCI Bus 0000:19
+    d0600000-d07fffff : PCI Bus 0000:1c
+d0800000-d0cfffff : PCI Bus 0000:0b
+  d0800000-d0cfffff : PCI Bus 0000:0c
+    d0800000-d08fffff : PCI Bus 0000:13
+      d0800000-d083ffff : 0000:13:00.0
+      d0840000-d087ffff : 0000:13:00.1
+    d0900000-d0afffff : PCI Bus 0000:10
+    d0b00000-d0cfffff : PCI Bus 0000:0d
+d0d00000-d0dfffff : PCI Bus 0000:02
+  d0d00000-d0d3ffff : 0000:02:00.0
 d8000000-dfffffff : PCI Bus 0000:01
   d8000000-dfffffff : 0000:01:03.0
 e0000000-efffffff : PCI MMCONFIG 0 [00-ff]
@@ -59,13 +66,13 @@ f7f00000-fbffffff : PCI Bus 0000:03
         fa000000-fbffffff : 0000:08:00.0
           fa000000-fbffffff : bnx2
 fdd00000-fdefffff : PCI Bus 0000:0b
-  fdd00000-fddfffff : PCI Bus 0000:0c
-    fdd00000-fddfffff : PCI Bus 0000:13
-      fdd00000-fdd03fff : 0000:13:00.0
-        fdd00000-fdd03fff : qla2xxx
-      fdd04000-fdd07fff : 0000:13:00.1
-        fdd04000-fdd07fff : qla2xxx
-  fde00000-fde1ffff : 0000:0b:00.0
+  fdde0000-fddfffff : 0000:0b:00.0
+  fde00000-fdefffff : PCI Bus 0000:0c
+    fde00000-fdefffff : PCI Bus 0000:13
+      fdee0000-fdee3fff : 0000:13:00.1
+        fdee0000-fdee3fff : qla2xxx
+      fdef0000-fdef3fff : 0000:13:00.0
+        fdef0000-fdef3fff : qla2xxx
 fdf00000-fdffffff : PCI Bus 0000:17
   fdfe0000-fdffffff : 0000:17:00.0
 fe000000-febfffff : pnp 00:01


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux