On Tue, Sep 18, 2012 at 7:50 PM, Yijing Wang <wangyijing@xxxxxxxxxx> wrote: > On 2012/9/19 1:54, Bjorn Helgaas wrote: >> On Mon, Sep 17, 2012 at 6:06 AM, Yijing Wang <wangyijing@xxxxxxxxxx> wrote: >>> On 2012/9/16 11:30, Bjorn Helgaas wrote: >>>> On Sat, Sep 15, 2012 at 4:22 AM, Yijing Wang <wangyijing@xxxxxxxxxx> wrote: >>>>> Hi all, >>>>> I encountered a very strange problem when I hot plug a fiber channel card(using qla2xxx driver). >>>>> I did the hotplug in arch x86 machine, using pciehp driver for hotplug, this platform supports pci hot-plug triggering from both >>>>> sysfs and attention button. If a hot-plug slot is empty when system boot-up, then hotplug FC card in this slot is ok. >>>>> If a hot-plug slot has been embeded a FC card when system boot-up, hot-remove this card is ok, but hot-add this card will fail. >>>>> I used >>>>> #modprobe qla2xxx ql2xextended_error_logging=0x7fffffff >>>>> to get all probe info. As bellow: >>>>> >>>>> Can anyone give me any suggestion for this problem? >>>> >>>> It sounds like you did this: >>>> >>>> 1) Power down system >>>> 2) Remove FC card from slot >>>> 3) Boot system >>>> 4) Hot-add FC card >>>> 5) Load qla2xxx driver >>>> 6) qla2xxx driver claims FC card >>>> 7) FC card works correctly >>>> >>>> 8) Power down system >>>> 9) Install FC card in slot >>>> 10) Boot system >>>> 11) Load qla2xxx driver >>>> 12) qla2xxx driver claims FC card >>>> 13) FC card works correctly >>> I rmmod qla2xxx driver here and modprobe qla2xxx ql2xextended_error_logging=0x1e400000 again for get errors info >>> Also I modprobe pciehp pciehp_debug=1 for getting debug info >>>> 14) Hot-remove card >>>> 15) Hot-add card >>>> 16) qla2xxx driver claims FC card >>>> 17) FC card does not work >>>> >>>> and I assume the dmesg log you included is just from steps 15 and 16 >>>> (correct me if I'm wrong). >>>> >>>> It would be useful to see the entire log showing all these events so >>>> we can compare the working cases with the non-working one. If you use >>>> the pciehp_debug module parameter, we should also see some pciehp >>>> events that would help me understand that driver. >>>> >>> >>> Hi Bjorn, >>> Thanks for your comments very much! >>> >>> My steps: >>> 1) power down system >>> 2) Install FC card in slot >>> 3) Boot system >>> 4) Load qla2xxx driver >>> 5) qla2xxx driver claims FC card >>> 6) FC card works correctly(at least probe return ok, I don't know qla2xxx driver much..) >>> 7) rmmod qla2xxx >>> 8) modprobe qla2xxx ql2xextended_error_logging=0x1e400000(for get errors info) >>> 9) modprobe pciehp pciehp_debug=1 >>> 10) Hot-remove card >>> 11) Hot-add card >>> 12) qla2xxx driver claims FC card fail(probe return fail, setup chip fail) >>> --------------------------------------so this is failed situation---------- >>> >>> --------------------------------------continue to hot-add fc card into empty slot(also support pci hp) >>> 13) Install FC card in empty slot >>> 14) Hot-add card >>> 15) qla2xxx driver claims FC card ok (probe return ok) >>> >>> btw: >>> If fc card firmware version 4.03, everything is ok (hot-plug in any slots(empty or not)) >>> fc card firmware version is 4.04 or 5.04 , situation as same as 1)--->12) >> >> Thanks. The FW change is a good clue. If everything works with >> version 4.03, but it doesn't work with version 4.04, it's likely to be >> a FW problem, not a Linux PCI core problem. >> >> Here's what I see from your logs. In slot 4 (bus 08), the card was >> present before boot, you removed it, re-added it, and it failed after >> being re-added. Slot 3 (bus 06) was empty at boot, you hot-added a >> card, and it worked. Here are the resources available on those two >> buses and the boot-time config of the first device in slot 4: >> >> pci 0000:00:07.0: PCI bridge to [bus 06-07] >> pci 0000:00:07.0: bridge window [io 0xc000-0xcfff] >> pci 0000:00:07.0: bridge window [mem 0xf9000000-0xf9ffffff] >> pci 0000:00:07.0: bridge window [mem 0xf1000000-0xf1ffffff 64bit pref] >> pci 0000:00:09.0: PCI bridge to [bus 08-09] >> pci 0000:00:09.0: bridge window [io 0xb000-0xbfff] >> pci 0000:00:09.0: bridge window [mem 0xf8000000-0xf8ffffff] >> pci 0000:00:09.0: bridge window [mem 0xf0000000-0xf0ffffff 64bit pref] >> pci 0000:08:00.0: [1077:2532] type 00 class 0x0c0400 >> pci 0000:08:00.0: reg 10: [io 0xb100-0xb1ff] >> pci 0000:08:00.0: reg 14: [mem 0xf8084000-0xf8087fff 64bit] >> pci 0000:08:00.0: reg 30: [mem 0xf8040000-0xf807ffff pref] >> >> After you remove and re-add the card in slot 4, it starts with >> uninitialized BARs as expected, then we assign resources to it. It's >> sort of interesting that the BIOS had originally put the ROM (reg 30) >> in the non-prefetchable window, while after the hot-add, Linux places >> it in the prefetchable window. Either should work, and in fact the >> card you added in slot 3 *does* work with its ROM in the prefetchable >> window. >> >> pci 0000:08:00.0: [1077:2532] type 00 class 0x0c0400 >> pci 0000:08:00.0: reg 10: [io 0x0000-0x00ff] >> pci 0000:08:00.0: reg 14: [mem 0x00000000-0x00003fff 64bit] >> pci 0000:08:00.0: reg 30: [mem 0x00000000-0x0003ffff pref] >> pci 0000:08:00.0: BAR 0: assigned [io 0xb000-0xb0ff] >> pci 0000:08:00.0: BAR 1: assigned [mem 0xf8000000-0xf8003fff 64bit] >> pci 0000:08:00.0: BAR 6: assigned [mem 0xf0000000-0xf003ffff pref] >> qla2xxx [0000:08:00.0]-0098:10: Failed to load segment 0 of firmware. >> qla2xxx [0000:08:00.0]-d008:10: No buffer available for dump. >> qla2xxx [0000:08:00.0]-008f:10: Failed to load segment 0 of firmware. >> qla2xxx [0000:08:00.0]-00cf:10: Setup chip ****FAILED****. >> >> When you hot-add the card in slot 3, it starts with uninitialized BARs >> as expected, but again, we assign valid resources to it: >> >> pci 0000:06:00.0: [1077:2532] type 00 class 0x0c0400 >> pci 0000:06:00.0: reg 10: [io 0x0000-0x00ff] >> pci 0000:06:00.0: reg 14: [mem 0x00000000-0x00003fff 64bit] >> pci 0000:06:00.0: reg 30: [mem 0x00000000-0x0003ffff pref] >> pci 0000:06:00.0: BAR 0: assigned [io 0xc000-0xc0ff] >> pci 0000:06:00.0: BAR 1: assigned [mem 0xf9000000-0xf9003fff 64bit] >> pci 0000:06:00.0: BAR 6: assigned [mem 0xf1000000-0xf103ffff pref] >> >> I don't see anything wrong from a PCI perspective. I suspect >> something strange in the card firmware. >> >> If you do figure out something wrong in PCI, let me know. >> >> Bjorn >> > > Hi Bjorn, > Thanks for your detailed analysis very much! > > We compared the two situations after BIOS initialization, and found Max Payload Size in DEVCTRL is 256B > if FC card had been installed, if the slot is empty, Max Payload Size is 128B. We force it to be 128B when > FC card installed when system boot up. Finally pci hotplug becomes ok. So I suspect maybe our PCIe hardware > has problem supporting 256B. Ah, this sounds like something I've been worried about for a while, i.e., do we handle MPS correctly when we hot-add devices? Yijing, I'm not quite clear on what you're observing. I guess you're saying that if an FC card is installed at boot, the BIOS sets MPS to 256, and that if no FC card is installed, the BIOS sets MPS to 128? You haven't mentioned any Linux boot options, so I assume you haven't tried any. Does "pci=pcie_bus_safe" make any difference? Jon, here's a pointer to the beginning of the thread: http://marc.info/?l=linux-pci&m=134770460302298&w=2 (full dmesg log at http://marc.info/?l=linux-scsi&m=134788365823217&w=2). I'm not sure we have enough in the dmesg log to diagnose an issue like this. I wonder if it would be useful to log the current setting, so we could notice BIOS default differences like this one. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html