On Thu, Oct 29 2009, Kenji Kaneshige wrote: > Jens Axboe wrote: >> On Wed, Oct 28 2009, Kenji Kaneshige wrote: >>> Jens Axboe wrote: >>>> On Tue, Oct 27 2009, Kenji Kaneshige wrote: >>>>> Jens Axboe wrote: >>>>>> On Tue, Oct 20 2009, Alex Chiang wrote: >>>>>>> * Jens Axboe <jens.axboe@xxxxxxxxxx>: >>>>>>>> On Tue, Oct 13 2009, Alex Chiang wrote: >>>>>>>>>>> Can you modprobe acpiphp with debug=1? And send the output? >>>>>>>>>> acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 >>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0 >>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00 >>>>>>>>>> acpiphp: Slot [1] registered >>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0 >>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00 >>>>>>>>>> acpiphp: Slot [2] registered >>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0 >>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00 >>>>>>>>>> acpiphp: Slot [6] registered >>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0 >>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00 >>>>>>>>>> acpiphp: Slot [7] registered >>>>>>>>>> acpiphp_glue: Bus 0000:87 has 1 slot >>>>>>>>>> acpiphp_glue: Bus 0000:84 has 1 slot >>>>>>>>>> acpiphp_glue: Bus 0000:0b has 1 slot >>>>>>>>>> acpiphp_glue: Bus 0000:08 has 1 slot >>>>>>>>>> acpiphp_glue: Total 4 slots >>>>>>>>> You mentioned in another mail that you echoed 1 into the various >>>>>>>>> slots' power files. >>>>>>>>> >>>>>>>>> Did you do that after modprobing acpiphp with debug=1? >>>>>>>>> >>>>>>>>> If so, there should be debug output when you try and turn them >>>>>>>>> on. >>>>>>>> It produces: >>>>>>>> >>>>>>>> acpiphp: enable_slot - physical_slot = 1 >>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL >>>>>>>> acpiphp: enable_slot - physical_slot = 2 >>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL >>>>>>>> acpiphp: enable_slot - physical_slot = 6 >>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL >>>>>>>> acpiphp: enable_slot - physical_slot = 7 >>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL >>>>>>> Hm, so for some reason, firmware on your machine is telling us >>>>>>> that it doesn't think cards are present and/or enabled. >>>>>>> >>>>>>> Unfortunately, I don't know why your firmware would be saying >>>>>>> that. We could add some more debug printks to see what firmware >>>>>>> thinks about your system... Or we could just wait and see what >>>>>>> happens after you get your hardware replaced. >>>>>> New board, the exact same thing happens. >>>>>> >>>>>>>> I have a card in one of the slots only this time. >>>>>>>> >>>>>>>>> Also, quick dummy check, you are trying to power on populated >>>>>>>>> slots, right? :) >>>>>>>> Yes :-) >>>>>>>> >>>>>>>>> Can you send the output of lspci -vv? And I like the output of >>>>>>>>> lspci -vt as well... Both before and after loading acpiphp >>>>>>>>> please. >>>>>>>> Send privately. >>>>>>> No difference in before and after. Odd. >>>>>>> >>>>>>> If you want to poke us again after your hardware swap, please do >>>>>>> so. Sorry for being not so helpful. :-/ >>>>>> Poke :-) >>>>>> >>>>>> One more thing I tried was pushing the power button on the slot >>>>>> manually. With acpiphp, I get the same messages as above. Using pciehp, >>>>>> I get the same power fault bit interrupt storm. So no difference from >>>>>> using the sysfs interface or doing it on the box side, doesn't work >>>>>> either way. >>>>>> >>>>> I'd like to confirm power fault interrupt storm, just in case. >>>>> Could you get /proc/interrupts information after power fault >>>>> problem happens and send it to me? >>>> The box pretty much hangs when I try to power on a slot with pciehp, so >>>> it's not easy to do... It doesn't hang with acpiphp, but doesn't work >>>> either (see previous reply to Alex). >>>> >>> Could you try the attached debugging patch? With this patch, power >>> fault interrupt would be disabled after 100 power fault detected ( >>> I hope so). You can get /proc/interrupts after that. >> >> Here is the output of doing the power on with that patch applied. >> >> pciehp 0000:00:05.0:pcie04: enable_slot: physical_slot = 1 >> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 77b >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10 >> pciehp 0000:00:05.0:pcie04: pciehp_power_on_slot: SLOTCTRL a8 write cmd 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10 >> pciehp 0000:00:05.0:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: Power fault interrupt received >> pciehp 0000:00:05.0:pcie04: Power fault on Slot(1) >> pciehp 0000:00:05.0:pcie04: Power fault bit 0 set >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2 >> pciehp 0000:00:05.0:pcie04: Data Link Layer Link Active not set in 1000 msec >> pciehp 0000:00:05.0:pcie04: pciehp_check_link_status: lnk_status = 1001 >> pciehp 0000:00:05.0:pcie04: Link Training Error occurs pciehp >> 0000:00:05.0:pcie04: Failed to check link status >> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec >> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12 >> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300 >> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12 >> pciehp 0000:00:05.0:pcie04: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400 >> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec >> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300 >> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec >> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40 >> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 779 >> pciehp 0000:00:05.0:pcie04: pciehp_get_attention_status: SLOTCTRL a8, value read 779 >> > > From the console log, it seems that my debug patch worked as I expected > (power fault event interrupts ware disabled after 100 power fault event). > But for some reasons, /proc/interrupts indicates only 5 interrupts of > pciehp. Just in case, did you get /proc/interrupts after doing power on? Nope, it was captured post the power on attempt and the above log dump. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html