On 2012/9/19 23:31, Jiang Liu wrote: > On 09/19/2012 09:39 PM, Bjorn Helgaas wrote: >> On Tue, Sep 18, 2012 at 7:50 PM, Yijing Wang <wangyijing@xxxxxxxxxx> wrote: >>> On 2012/9/19 1:54, Bjorn Helgaas wrote: >>>> On Mon, Sep 17, 2012 at 6:06 AM, Yijing Wang <wangyijing@xxxxxxxxxx> wrote: >>>>> On 2012/9/16 11:30, Bjorn Helgaas wrote: >>>>>> On Sat, Sep 15, 2012 at 4:22 AM, Yijing Wang <wangyijing@xxxxxxxxxx> wrote: >>>>>>> Hi all, >>>>>>> I encountered a very strange problem when I hot plug a fiber channel card(using qla2xxx driver). >>>>>>> I did the hotplug in arch x86 machine, using pciehp driver for hotplug, this platform supports pci hot-plug triggering from both >>>>>>> sysfs and attention button. If a hot-plug slot is empty when system boot-up, then hotplug FC card in this slot is ok. >>>>>>> If a hot-plug slot has been embeded a FC card when system boot-up, hot-remove this card is ok, but hot-add this card will fail. >>>>>>> I used >>>>>>> #modprobe qla2xxx ql2xextended_error_logging=0x7fffffff >>>>>>> to get all probe info. As bellow: >>>>>>> >>>>>>> Can anyone give me any suggestion for this problem? >>>>>> >>>>>> It sounds like you did this: >>>>>> >>>>>> 1) Power down system >>>>>> 2) Remove FC card from slot >>>>>> 3) Boot system >>>>>> 4) Hot-add FC card >>>>>> 5) Load qla2xxx driver >>>>>> 6) qla2xxx driver claims FC card >>>>>> 7) FC card works correctly >>>>>> >>>>>> 8) Power down system >>>>>> 9) Install FC card in slot >>>>>> 10) Boot system >>>>>> 11) Load qla2xxx driver >>>>>> 12) qla2xxx driver claims FC card >>>>>> 13) FC card works correctly >>>>> I rmmod qla2xxx driver here and modprobe qla2xxx ql2xextended_error_logging=0x1e400000 again for get errors info >>>>> Also I modprobe pciehp pciehp_debug=1 for getting debug info >>>>>> 14) Hot-remove card >>>>>> 15) Hot-add card >>>>>> 16) qla2xxx driver claims FC card >>>>>> 17) FC card does not work >>>>>> >>>>>> and I assume the dmesg log you included is just from steps 15 and 16 >>>>>> (correct me if I'm wrong). >>>>>> >>>>>> It would be useful to see the entire log showing all these events so >>>>>> we can compare the working cases with the non-working one. If you use >>>>>> the pciehp_debug module parameter, we should also see some pciehp >>>>>> events that would help me understand that driver. >>>>>> >>>>> >>>>> Hi Bjorn, >>>>> Thanks for your comments very much! >>>>> >>>>> My steps: >>>>> 1) power down system >>>>> 2) Install FC card in slot >>>>> 3) Boot system >>>>> 4) Load qla2xxx driver >>>>> 5) qla2xxx driver claims FC card >>>>> 6) FC card works correctly(at least probe return ok, I don't know qla2xxx driver much..) >>>>> 7) rmmod qla2xxx >>>>> 8) modprobe qla2xxx ql2xextended_error_logging=0x1e400000(for get errors info) >>>>> 9) modprobe pciehp pciehp_debug=1 >>>>> 10) Hot-remove card >>>>> 11) Hot-add card >>>>> 12) qla2xxx driver claims FC card fail(probe return fail, setup chip fail) >>>>> --------------------------------------so this is failed situation---------- >>>>> >>>>> --------------------------------------continue to hot-add fc card into empty slot(also support pci hp) >>>>> 13) Install FC card in empty slot >>>>> 14) Hot-add card >>>>> 15) qla2xxx driver claims FC card ok (probe return ok) >>>>> >>>>> btw: >>>>> If fc card firmware version 4.03, everything is ok (hot-plug in any slots(empty or not)) >>>>> fc card firmware version is 4.04 or 5.04 , situation as same as 1)--->12) >>>> >>>> Thanks. The FW change is a good clue. If everything works with >>>> version 4.03, but it doesn't work with version 4.04, it's likely to be >>>> a FW problem, not a Linux PCI core problem. >>>> >>>> Here's what I see from your logs. In slot 4 (bus 08), the card was >>>> present before boot, you removed it, re-added it, and it failed after >>>> being re-added. Slot 3 (bus 06) was empty at boot, you hot-added a >>>> card, and it worked. Here are the resources available on those two >>>> buses and the boot-time config of the first device in slot 4: >>>> >>>> pci 0000:00:07.0: PCI bridge to [bus 06-07] >>>> pci 0000:00:07.0: bridge window [io 0xc000-0xcfff] >>>> pci 0000:00:07.0: bridge window [mem 0xf9000000-0xf9ffffff] >>>> pci 0000:00:07.0: bridge window [mem 0xf1000000-0xf1ffffff 64bit pref] >>>> pci 0000:00:09.0: PCI bridge to [bus 08-09] >>>> pci 0000:00:09.0: bridge window [io 0xb000-0xbfff] >>>> pci 0000:00:09.0: bridge window [mem 0xf8000000-0xf8ffffff] >>>> pci 0000:00:09.0: bridge window [mem 0xf0000000-0xf0ffffff 64bit pref] >>>> pci 0000:08:00.0: [1077:2532] type 00 class 0x0c0400 >>>> pci 0000:08:00.0: reg 10: [io 0xb100-0xb1ff] >>>> pci 0000:08:00.0: reg 14: [mem 0xf8084000-0xf8087fff 64bit] >>>> pci 0000:08:00.0: reg 30: [mem 0xf8040000-0xf807ffff pref] >>>> >>>> After you remove and re-add the card in slot 4, it starts with >>>> uninitialized BARs as expected, then we assign resources to it. It's >>>> sort of interesting that the BIOS had originally put the ROM (reg 30) >>>> in the non-prefetchable window, while after the hot-add, Linux places >>>> it in the prefetchable window. Either should work, and in fact the >>>> card you added in slot 3 *does* work with its ROM in the prefetchable >>>> window. >>>> >>>> pci 0000:08:00.0: [1077:2532] type 00 class 0x0c0400 >>>> pci 0000:08:00.0: reg 10: [io 0x0000-0x00ff] >>>> pci 0000:08:00.0: reg 14: [mem 0x00000000-0x00003fff 64bit] >>>> pci 0000:08:00.0: reg 30: [mem 0x00000000-0x0003ffff pref] >>>> pci 0000:08:00.0: BAR 0: assigned [io 0xb000-0xb0ff] >>>> pci 0000:08:00.0: BAR 1: assigned [mem 0xf8000000-0xf8003fff 64bit] >>>> pci 0000:08:00.0: BAR 6: assigned [mem 0xf0000000-0xf003ffff pref] >>>> qla2xxx [0000:08:00.0]-0098:10: Failed to load segment 0 of firmware. >>>> qla2xxx [0000:08:00.0]-d008:10: No buffer available for dump. >>>> qla2xxx [0000:08:00.0]-008f:10: Failed to load segment 0 of firmware. >>>> qla2xxx [0000:08:00.0]-00cf:10: Setup chip ****FAILED****. >>>> >>>> When you hot-add the card in slot 3, it starts with uninitialized BARs >>>> as expected, but again, we assign valid resources to it: >>>> >>>> pci 0000:06:00.0: [1077:2532] type 00 class 0x0c0400 >>>> pci 0000:06:00.0: reg 10: [io 0x0000-0x00ff] >>>> pci 0000:06:00.0: reg 14: [mem 0x00000000-0x00003fff 64bit] >>>> pci 0000:06:00.0: reg 30: [mem 0x00000000-0x0003ffff pref] >>>> pci 0000:06:00.0: BAR 0: assigned [io 0xc000-0xc0ff] >>>> pci 0000:06:00.0: BAR 1: assigned [mem 0xf9000000-0xf9003fff 64bit] >>>> pci 0000:06:00.0: BAR 6: assigned [mem 0xf1000000-0xf103ffff pref] >>>> >>>> I don't see anything wrong from a PCI perspective. I suspect >>>> something strange in the card firmware. >>>> >>>> If you do figure out something wrong in PCI, let me know. >>>> >>>> Bjorn >>>> >>> >>> Hi Bjorn, >>> Thanks for your detailed analysis very much! >>> >>> We compared the two situations after BIOS initialization, and found Max Payload Size in DEVCTRL is 256B >>> if FC card had been installed, if the slot is empty, Max Payload Size is 128B. We force it to be 128B when >>> FC card installed when system boot up. Finally pci hotplug becomes ok. So I suspect maybe our PCIe hardware >>> has problem supporting 256B. >> >> Ah, this sounds like something I've been worried about for a while, >> i.e., do we handle MPS correctly when we hot-add devices? >> >> Yijing, I'm not quite clear on what you're observing. I guess you're >> saying that if an FC card is installed at boot, the BIOS sets MPS to >> 256, and that if no FC card is installed, the BIOS sets MPS to 128? >> You haven't mentioned any Linux boot options, so I assume you haven't >> tried any. Does "pci=pcie_bus_safe" make any difference? >> >> Jon, here's a pointer to the beginning of the thread: >> http://marc.info/?l=linux-pci&m=134770460302298&w=2 (full dmesg log at >> http://marc.info/?l=linux-scsi&m=134788365823217&w=2). I'm not sure >> we have enough in the dmesg log to diagnose an issue like this. I >> wonder if it would be useful to log the current setting, so we could >> notice BIOS default differences like this one. > > Hi Yijing, > It's possible that the issue is caused by pcie_bus_configure_settings() instead of > hardware flaw. By default, pcie_bus_config is setting to PCIE_BUS_TUNE_OFF, which means > all PCIe devices' Max Payload Size if configured by BIOS and OS won't change it. > So could you please help to: > 1) add "pci=pcie_bus_safe" kernel option and check whether the behavior changes. > 2) Print out Max Payload Size configuration for all PCIe devices along the path from > the hod-added card to corresponding root port. > 3) tracing executing of pcie_bus_configure_settings(). > Thanks! > Gerry > card to the > OK, maybe you are right, I will try the next. Thanks Yijing > > . > -- Thanks! Yijing -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html