On 2012/9/19 7:49, Giridhar Malavali wrote: > > > On 9/18/12 10:54 AM, "Bjorn Helgaas" <bhelgaas@xxxxxxxxxx> wrote: > >> On Mon, Sep 17, 2012 at 6:06 AM, Yijing Wang <wangyijing@xxxxxxxxxx> >> wrote: >>> On 2012/9/16 11:30, Bjorn Helgaas wrote: >>>> On Sat, Sep 15, 2012 at 4:22 AM, Yijing Wang <wangyijing@xxxxxxxxxx> >>>> wrote: >>>>> Hi all, >>>>> I encountered a very strange problem when I hot plug a fiber >>>>> channel card(using qla2xxx driver). >>>>> I did the hotplug in arch x86 machine, using pciehp driver for >>>>> hotplug, this platform supports pci hot-plug triggering from both >>>>> sysfs and attention button. If a hot-plug slot is empty when system >>>>> boot-up, then hotplug FC card in this slot is ok. >>>>> If a hot-plug slot has been embeded a FC card when system boot-up, >>>>> hot-remove this card is ok, but hot-add this card will fail. >>>>> I used >>>>> #modprobe qla2xxx ql2xextended_error_logging=0x7fffffff >>>>> to get all probe info. As bellow: >>>>> >>>>> Can anyone give me any suggestion for this problem? >>>> >>>> It sounds like you did this: >>>> >>>> 1) Power down system >>>> 2) Remove FC card from slot >>>> 3) Boot system >>>> 4) Hot-add FC card >>>> 5) Load qla2xxx driver >>>> 6) qla2xxx driver claims FC card >>>> 7) FC card works correctly >>>> >>>> 8) Power down system >>>> 9) Install FC card in slot >>>> 10) Boot system >>>> 11) Load qla2xxx driver >>>> 12) qla2xxx driver claims FC card >>>> 13) FC card works correctly >>> I rmmod qla2xxx driver here and modprobe qla2xxx >>> ql2xextended_error_logging=0x1e400000 again for get errors info >>> Also I modprobe pciehp pciehp_debug=1 for getting debug info >>>> 14) Hot-remove card >>>> 15) Hot-add card >>>> 16) qla2xxx driver claims FC card >>>> 17) FC card does not work >>>> >>>> and I assume the dmesg log you included is just from steps 15 and 16 >>>> (correct me if I'm wrong). >>>> >>>> It would be useful to see the entire log showing all these events so >>>> we can compare the working cases with the non-working one. If you use >>>> the pciehp_debug module parameter, we should also see some pciehp >>>> events that would help me understand that driver. >>>> >>> >>> Hi Bjorn, >>> Thanks for your comments very much! >>> >>> My steps: >>> 1) power down system >>> 2) Install FC card in slot >>> 3) Boot system >>> 4) Load qla2xxx driver >>> 5) qla2xxx driver claims FC card >>> 6) FC card works correctly(at least probe return ok, I don't know >>> qla2xxx driver much..) >>> 7) rmmod qla2xxx >>> 8) modprobe qla2xxx ql2xextended_error_logging=0x1e400000(for get >>> errors info) >>> 9) modprobe pciehp pciehp_debug=1 >>> 10) Hot-remove card >>> 11) Hot-add card >>> 12) qla2xxx driver claims FC card fail(probe return fail, setup chip >>> fail) >>> --------------------------------------so this is failed >>> situation---------- >>> >>> --------------------------------------continue to hot-add fc card into >>> empty slot(also support pci hp) >>> 13) Install FC card in empty slot >>> 14) Hot-add card >>> 15) qla2xxx driver claims FC card ok (probe return ok) >>> >>> btw: >>> If fc card firmware version 4.03, everything is ok (hot-plug in any >>> slots(empty or not)) >>> fc card firmware version is 4.04 or 5.04 , situation as same as >>> 1)--->12) > > That's good data pointer. Let me follow up with firmware team and get back > to you. > Hi Giri, We Found that this problem(hot-plug fail for 4.04 and 4.05 fw fc card) is generated by Max Payload Size in DEVCTRL. The MPS was 256B when hot plug fail situation occurs. If manually force it set to 128B, everything will be ok. So maybe our hardware have some problems. Thanks! Yijing > -- Giri >> >> Thanks. The FW change is a good clue. If everything works with >> version 4.03, but it doesn't work with version 4.04, it's likely to be >> a FW problem, not a Linux PCI core problem. >> >> Here's what I see from your logs. In slot 4 (bus 08), the card was >> present before boot, you removed it, re-added it, and it failed after >> being re-added. Slot 3 (bus 06) was empty at boot, you hot-added a >> card, and it worked. Here are the resources available on those two >> buses and the boot-time config of the first device in slot 4: >> >> pci 0000:00:07.0: PCI bridge to [bus 06-07] >> pci 0000:00:07.0: bridge window [io 0xc000-0xcfff] >> pci 0000:00:07.0: bridge window [mem 0xf9000000-0xf9ffffff] >> pci 0000:00:07.0: bridge window [mem 0xf1000000-0xf1ffffff 64bit >> pref] >> pci 0000:00:09.0: PCI bridge to [bus 08-09] >> pci 0000:00:09.0: bridge window [io 0xb000-0xbfff] >> pci 0000:00:09.0: bridge window [mem 0xf8000000-0xf8ffffff] >> pci 0000:00:09.0: bridge window [mem 0xf0000000-0xf0ffffff 64bit >> pref] >> pci 0000:08:00.0: [1077:2532] type 00 class 0x0c0400 >> pci 0000:08:00.0: reg 10: [io 0xb100-0xb1ff] >> pci 0000:08:00.0: reg 14: [mem 0xf8084000-0xf8087fff 64bit] >> pci 0000:08:00.0: reg 30: [mem 0xf8040000-0xf807ffff pref] >> >> After you remove and re-add the card in slot 4, it starts with >> uninitialized BARs as expected, then we assign resources to it. It's >> sort of interesting that the BIOS had originally put the ROM (reg 30) >> in the non-prefetchable window, while after the hot-add, Linux places >> it in the prefetchable window. Either should work, and in fact the >> card you added in slot 3 *does* work with its ROM in the prefetchable >> window. >> >> pci 0000:08:00.0: [1077:2532] type 00 class 0x0c0400 >> pci 0000:08:00.0: reg 10: [io 0x0000-0x00ff] >> pci 0000:08:00.0: reg 14: [mem 0x00000000-0x00003fff 64bit] >> pci 0000:08:00.0: reg 30: [mem 0x00000000-0x0003ffff pref] >> pci 0000:08:00.0: BAR 0: assigned [io 0xb000-0xb0ff] >> pci 0000:08:00.0: BAR 1: assigned [mem 0xf8000000-0xf8003fff 64bit] >> pci 0000:08:00.0: BAR 6: assigned [mem 0xf0000000-0xf003ffff pref] >> qla2xxx [0000:08:00.0]-0098:10: Failed to load segment 0 of >> firmware. >> qla2xxx [0000:08:00.0]-d008:10: No buffer available for dump. >> qla2xxx [0000:08:00.0]-008f:10: Failed to load segment 0 of >> firmware. >> qla2xxx [0000:08:00.0]-00cf:10: Setup chip ****FAILED****. >> >> When you hot-add the card in slot 3, it starts with uninitialized BARs >> as expected, but again, we assign valid resources to it: >> >> pci 0000:06:00.0: [1077:2532] type 00 class 0x0c0400 >> pci 0000:06:00.0: reg 10: [io 0x0000-0x00ff] >> pci 0000:06:00.0: reg 14: [mem 0x00000000-0x00003fff 64bit] >> pci 0000:06:00.0: reg 30: [mem 0x00000000-0x0003ffff pref] >> pci 0000:06:00.0: BAR 0: assigned [io 0xc000-0xc0ff] >> pci 0000:06:00.0: BAR 1: assigned [mem 0xf9000000-0xf9003fff 64bit] >> pci 0000:06:00.0: BAR 6: assigned [mem 0xf1000000-0xf103ffff pref] >> >> I don't see anything wrong from a PCI perspective. I suspect >> something strange in the card firmware. >> >> If you do figure out something wrong in PCI, let me know. >> >> Bjorn >> > > > > . > -- Thanks! Yijing -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html