Re: Fail to probe qla2xxx fiber channel card while doing pci hotplug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/19/2012 09:39 PM, Bjorn Helgaas wrote:
> On Tue, Sep 18, 2012 at 7:50 PM, Yijing Wang <wangyijing@xxxxxxxxxx> wrote:
>> On 2012/9/19 1:54, Bjorn Helgaas wrote:
>>> On Mon, Sep 17, 2012 at 6:06 AM, Yijing Wang <wangyijing@xxxxxxxxxx> wrote:
>>>> On 2012/9/16 11:30, Bjorn Helgaas wrote:
>>>>> On Sat, Sep 15, 2012 at 4:22 AM, Yijing Wang <wangyijing@xxxxxxxxxx> wrote:
>>>>>> Hi all,
>>>>>>    I encountered a very strange problem when I hot plug a fiber channel card(using qla2xxx driver).
>>>>>> I did the hotplug in arch x86 machine, using pciehp driver for hotplug, this platform supports pci hot-plug triggering from both
>>>>>> sysfs and attention button. If a hot-plug slot is empty when system boot-up, then hotplug FC card in this slot is ok.
>>>>>> If a hot-plug slot has been embeded a FC card when system boot-up, hot-remove this card is ok, but hot-add this card will fail.
>>>>>> I used
>>>>>> #modprobe qla2xxx ql2xextended_error_logging=0x7fffffff
>>>>>> to get all probe info. As bellow:
>>>>>>
>>>>>> Can anyone give me any suggestion for this problem?
>>>>>
>>>>> It sounds like you did this:
>>>>>
>>>>>   1) Power down system
>>>>>   2) Remove FC card from slot
>>>>>   3) Boot system
>>>>>   4) Hot-add FC card
>>>>>   5) Load qla2xxx driver
>>>>>   6) qla2xxx driver claims FC card
>>>>>   7) FC card works correctly
>>>>>
>>>>>   8) Power down system
>>>>>   9) Install FC card in slot
>>>>>  10) Boot system
>>>>>  11) Load qla2xxx driver
>>>>>  12) qla2xxx driver claims FC card
>>>>>  13) FC card works correctly
>>>> I rmmod qla2xxx driver here and modprobe qla2xxx ql2xextended_error_logging=0x1e400000 again for get errors info
>>>> Also I modprobe pciehp pciehp_debug=1 for getting debug info
>>>>>  14) Hot-remove card
>>>>>  15) Hot-add card
>>>>>  16) qla2xxx driver claims FC card
>>>>>  17) FC card does not work
>>>>>
>>>>> and I assume the dmesg log you included is just from steps 15 and 16
>>>>> (correct me if I'm wrong).
>>>>>
>>>>> It would be useful to see the entire log showing all these events so
>>>>> we can compare the working cases with the non-working one.  If you use
>>>>> the pciehp_debug module parameter, we should also see some pciehp
>>>>> events that would help me understand that driver.
>>>>>
>>>>
>>>> Hi Bjorn,
>>>>    Thanks for your comments very much!
>>>>
>>>> My steps:
>>>> 1) power down system
>>>> 2) Install FC card in slot
>>>> 3) Boot system
>>>> 4) Load qla2xxx driver
>>>> 5) qla2xxx driver claims FC card
>>>> 6) FC card works correctly(at least probe return ok, I don't know qla2xxx driver much..)
>>>> 7) rmmod qla2xxx
>>>> 8) modprobe qla2xxx ql2xextended_error_logging=0x1e400000(for get errors info)
>>>> 9) modprobe pciehp pciehp_debug=1
>>>> 10) Hot-remove card
>>>> 11) Hot-add card
>>>> 12) qla2xxx driver claims FC card fail(probe return fail, setup chip fail)
>>>> --------------------------------------so this is failed situation----------
>>>>
>>>> --------------------------------------continue to hot-add fc card into empty slot(also support pci hp)
>>>> 13) Install FC card in empty slot
>>>> 14) Hot-add card
>>>> 15) qla2xxx driver claims FC card ok (probe return ok)
>>>>
>>>> btw:
>>>> If fc card firmware version 4.03, everything is ok (hot-plug in any slots(empty or not))
>>>> fc card firmware version is 4.04 or 5.04 , situation as same as 1)--->12)
>>>
>>> Thanks.  The FW change is a good clue.  If everything works with
>>> version 4.03, but it doesn't work with version 4.04, it's likely to be
>>> a FW problem, not a Linux PCI core problem.
>>>
>>> Here's what I see from your logs.  In slot 4 (bus 08), the card was
>>> present before boot, you removed it, re-added it, and it failed after
>>> being re-added.  Slot 3 (bus 06) was empty at boot, you hot-added a
>>> card, and it worked.  Here are the resources available on those two
>>> buses and the boot-time config of the first device in slot 4:
>>>
>>>       pci 0000:00:07.0: PCI bridge to [bus 06-07]
>>>       pci 0000:00:07.0:   bridge window [io  0xc000-0xcfff]
>>>       pci 0000:00:07.0:   bridge window [mem 0xf9000000-0xf9ffffff]
>>>       pci 0000:00:07.0:   bridge window [mem 0xf1000000-0xf1ffffff 64bit pref]
>>>       pci 0000:00:09.0: PCI bridge to [bus 08-09]
>>>       pci 0000:00:09.0:   bridge window [io  0xb000-0xbfff]
>>>       pci 0000:00:09.0:   bridge window [mem 0xf8000000-0xf8ffffff]
>>>       pci 0000:00:09.0:   bridge window [mem 0xf0000000-0xf0ffffff 64bit pref]
>>>       pci 0000:08:00.0: [1077:2532] type 00 class 0x0c0400
>>>       pci 0000:08:00.0: reg 10: [io  0xb100-0xb1ff]
>>>       pci 0000:08:00.0: reg 14: [mem 0xf8084000-0xf8087fff 64bit]
>>>       pci 0000:08:00.0: reg 30: [mem 0xf8040000-0xf807ffff pref]
>>>
>>> After you remove and re-add the card in slot 4, it starts with
>>> uninitialized BARs as expected, then we assign resources to it.  It's
>>> sort of interesting that the BIOS had originally put the ROM (reg 30)
>>> in the non-prefetchable window, while after the hot-add, Linux places
>>> it in the prefetchable window.  Either should work, and in fact the
>>> card you added in slot 3 *does* work with its ROM in the prefetchable
>>> window.
>>>
>>>       pci 0000:08:00.0: [1077:2532] type 00 class 0x0c0400
>>>       pci 0000:08:00.0: reg 10: [io  0x0000-0x00ff]
>>>       pci 0000:08:00.0: reg 14: [mem 0x00000000-0x00003fff 64bit]
>>>       pci 0000:08:00.0: reg 30: [mem 0x00000000-0x0003ffff pref]
>>>       pci 0000:08:00.0: BAR 0: assigned [io  0xb000-0xb0ff]
>>>       pci 0000:08:00.0: BAR 1: assigned [mem 0xf8000000-0xf8003fff 64bit]
>>>       pci 0000:08:00.0: BAR 6: assigned [mem 0xf0000000-0xf003ffff pref]
>>>       qla2xxx [0000:08:00.0]-0098:10: Failed to load segment 0 of firmware.
>>>       qla2xxx [0000:08:00.0]-d008:10: No buffer available for dump.
>>>       qla2xxx [0000:08:00.0]-008f:10: Failed to load segment 0 of firmware.
>>>       qla2xxx [0000:08:00.0]-00cf:10: Setup chip ****FAILED****.
>>>
>>> When you hot-add the card in slot 3, it starts with uninitialized BARs
>>> as expected, but again, we assign valid resources to it:
>>>
>>>       pci 0000:06:00.0: [1077:2532] type 00 class 0x0c0400
>>>       pci 0000:06:00.0: reg 10: [io  0x0000-0x00ff]
>>>       pci 0000:06:00.0: reg 14: [mem 0x00000000-0x00003fff 64bit]
>>>       pci 0000:06:00.0: reg 30: [mem 0x00000000-0x0003ffff pref]
>>>       pci 0000:06:00.0: BAR 0: assigned [io  0xc000-0xc0ff]
>>>       pci 0000:06:00.0: BAR 1: assigned [mem 0xf9000000-0xf9003fff 64bit]
>>>       pci 0000:06:00.0: BAR 6: assigned [mem 0xf1000000-0xf103ffff pref]
>>>
>>> I don't see anything wrong from a PCI perspective.  I suspect
>>> something strange in the card firmware.
>>>
>>> If you do figure out something wrong in PCI, let me know.
>>>
>>> Bjorn
>>>
>>
>> Hi Bjorn,
>>    Thanks for your detailed analysis very much!
>>
>> We compared the two situations after BIOS initialization, and found Max Payload Size in DEVCTRL is 256B
>> if FC card had been installed, if the slot is empty, Max Payload Size is 128B. We force it to be 128B when
>> FC card installed when system boot up. Finally pci hotplug becomes ok. So I suspect maybe our PCIe hardware
>> has problem supporting 256B.
> 
> Ah, this sounds like something I've been worried about for a while,
> i.e., do we handle MPS correctly when we hot-add devices?
> 
> Yijing, I'm not quite clear on what you're observing.  I guess you're
> saying that if an FC card is installed at boot, the BIOS sets MPS to
> 256, and that if no FC card is installed, the BIOS sets MPS to 128?
> You haven't mentioned any Linux boot options, so I assume you haven't
> tried any.  Does "pci=pcie_bus_safe" make any difference?
> 
> Jon, here's a pointer to the beginning of the thread:
> http://marc.info/?l=linux-pci&m=134770460302298&w=2 (full dmesg log at
> http://marc.info/?l=linux-scsi&m=134788365823217&w=2).  I'm not sure
> we have enough in the dmesg log to diagnose an issue like this.  I
> wonder if it would be useful to log the current setting, so we could
> notice BIOS default differences like this one.

Hi Yijing,
	It's possible that the issue is caused by pcie_bus_configure_settings() instead of
hardware flaw. By default, pcie_bus_config is setting to PCIE_BUS_TUNE_OFF, which means 
all PCIe devices' Max Payload Size if configured by BIOS and OS won't change it.
	So could you please help to:
	1) add "pci=pcie_bus_safe" kernel option and check whether the behavior changes.
	2) Print out Max Payload Size configuration for all PCIe devices along the path from
the hod-added card to corresponding root port.
	3) tracing executing of pcie_bus_configure_settings().
	Thanks!
	Gerry
card to the 

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux