Re: 3.9-rc1: pciehp and eSATA card SiI 3132, no XHCI

Martin Mokrejs <mmokrejs@xxxxxxxxxxxxxxxxxx> · Tue, 30 Apr 2013 23:09:07 +0200

Hi,
  I went to test final 3.9 kernel and was almost ready to report no difference
in pciehp and *no hotplug functionality of either of my 3 express cards*. But ...
I also tested change from pcie_aspm=off to pcie_aspm=native and hot events started
to work!
So, I would like to ask you to review the code affected by both values aiming
to find an explanation. You have the collected data available and email archives
so I believe this should be easy now for you.

  If you want to compare with 3.9 kernel then I uploaded collected with pciehp
under 3.9 kernel to http://195.113.57.32/~mmokrejs/tmp/20130430.tar.bz2 .
The 3.8.8 data is acpiphp testing, so ignore it for this thread.

Thank you,
Martin

Martin Mokrejs wrote:
> Huang Ying wrote:
>> On Sun, 2013-03-31 at 17:04 +0200, Martin Mokrejs wrote:
>>> Hi Ying,
>>>   
>>> Huang Ying wrote:
>>>> Hi, Martin,
>>>>
>>>> Thanks for your testing!
>>>>
>>>> On Sun, 2013-03-31 at 12:35 +0200, Martin Mokrejs wrote:
>>>>> Hi Ying,
>>>>>   I have tested 4x your last patch. Somehow nothing gets logged to "dmesg"
>>>>> when I hotremove or hotinsert the coldbooted eSATA card. Logging works so
>>>>> enabling wifi via Fn+F2 is being logged. Also, eventual stacktraces
>>>>> and kmemleaks.
>>>>>   I removed the coldbooted card, inserted it and ejected it.
>>>>>
>>>>>
>>>>>   In brief, lspci reports changes but there are no changes in /proc/interrupts
>>>>> related to
>>>>>
>>>>>   19:          0          0   IO-APIC-fasteoi   sata_sil24
>>>>>
>>>>>
>>>>> and no changes at all in /proc/iomem which I expected to happen during
>>>>> hotremoval and hotinsert (something broken in 3.9-rc1 with your patch).
>>>>>
>>>>> All the runtime_status data were same after every tested step, so again,
>>>>> no diffs to show but here are the values confirming laptop-mode-tools
>>>>> enabled powersaving:
>>>>>
>>>>> /sys/bus/pci/devices/0000:00:00.0/power/runtime_status:suspended
>>>>> /sys/bus/pci/devices/0000:00:02.0/power/runtime_status:active
>>>>> /sys/bus/pci/devices/0000:00:16.0/power/runtime_status:suspended
>>>>> /sys/bus/pci/devices/0000:00:1a.0/power/runtime_status:suspended
>>>>> /sys/bus/pci/devices/0000:00:1b.0/power/runtime_status:active
>>>>> /sys/bus/pci/devices/0000:00:1c.0/power/runtime_status:suspended
>>>>> /sys/bus/pci/devices/0000:00:1c.1/power/runtime_status:active
>>>>> /sys/bus/pci/devices/0000:00:1c.3/power/runtime_status:active
>>>>> /sys/bus/pci/devices/0000:00:1c.4/power/runtime_status:active
>>>>> /sys/bus/pci/devices/0000:00:1c.7/power/runtime_status:active
>>>>
>>>> It appears that 1c.7 is identified successfully as an hotplug-able PCIe
>>>> port, and never put into suspended state.
>>>
>>> Yes. Truly said, after I now went to test your previous two patches
>>> on the 3.9-rc1 I confirm that the syslog logging is broken with all your
>>> three patches. I fear we are hitting here, with the pciehp problems
>>> not a powersaving issue but an upstream /proc or /sys files being outdated.
>>> Otherwise I can't figure out why disabling in runtime laptop-mode-tools
>>> and doing the "find /sys .... | while ... echo "on" > $f" trickery
>>> does not help to get pciehp working. This would have fixed the acpiphp
>>> at least on 3.8 kernel. I see that sata_sil24 is not loaded by itself
>>> during hotinsert. It seems lspci reports at such times 0xff for the 11:00
>>> eSATA card, /etc/iomem reports stale memory regions used by 11:00 while
>>> /proc/interrupts says no IRQ is assigned to sata_sil24 (well, sata_sil24
>>> is not loaded per lsmod, lspci would should report sata_sil24 also but
>>> provided the 11:00 entry is broken and shows the 0xff it maybe cannot
>>> report is sata_sil24 is loaded).
>>>
>>> I will post a little more details as a proper answer to your other patch
>>> where I managed to get yet another stacktrace, about the eSATA thought to
>>> be D3 state. Physically the card was ejected and just a modprobe sata_sil24
>>> caused the sata_sil24 to use some outdated data. I will dive now into
>>> that. 
>>>
>>>
>>>
>>>>
>>>> And from your description below, it appears that hot-add and hot-remove
>>>> of the eSATA card works for you, doesn't it?
>>>
>>> The PresDet works fine I think, yes. Sometimes I see in the lspci -vvv diffs:
>>>
>>> -Control: I/O+ ... BusMaster+
>>> +Control: I/O- ... BusMaster-
>>
>> But after hot-insert, can you use your eSATA card?  It appears that it
>> is detected properly.
> 
> Can't say about the above two. But under pciehp what is broken is the hotremoval.
> I think the rest is just a downstream consequence.
> 
>>
>>> and sometimes 
>>>
>>> -        Latency: 0, Cache Line Size: 64 bytes
>>> +        Latency: 0
> 
> It seems to me that bridges in lspci output have 'Latency: 0' while end devices have
> the Cache Line Size as well.
> 
> When the card is hot inserted after a previous hot removal and seems "dead" then
> lspci says:
> Control: I/O- Mem- BusMaster-
> Interrupt: pin A routed to IRQ 19
> and no 'Latency:' and no 'Cache Line Size:' are the output of the 11:00  device.
> 
> But please realize this is likely screwed because a previous eject of the card did not
> fully release resources. When the slot was empty lspci reported 0xff and when it is
> loaded it likely reports some crap. Unless the bug causing 'stale' data to be reported
> (the 'Re: 3.8.2: stale pci device info for a previously inserted express card' thread)
> I wonder what can we trust in this output.
> 
>>>
>>> or even the Latency: line being gone completely from lspci -vvv output. Why is that?
>>> I think debug checks and prints in kernel are necessary.
>>>
>>>
>>> How do these related to /proc/interrupts not showing an IRQ for the 11:00 device?
>>> Does that prevent automated sata_sil24 loading once the card is inserted? Would
>>> you please add some extra debug prints and checks into the kernel?
>>>
>>> Take also into consideration the "3.8.2: stale pci device info for a previously inserted express card"
>>> for a list of chimeric entries reported by lspci. That could tell you which values
>>> are being cached and invalid. Hopefully some checks could be done between values
>>> read by lspci and those in /proc and /sys.
>>>
>>>
>>>
>>> Do you already know why almost nothing is logged by kernel wen either of your
>>> three patches (v1 sent on 03/29/13 08:41, v2 sent on 03/29/13 09:20, v3 sent on
>>> 03/30/13 11:54)?
>>
>> No.  Don't know why.  unpatched upstream kernel can produce kernel log?
> 
> OK, vanilla 3.9-rc1 also prints nothing to syslog relevant to hotplug (only pciehp
> tested). Logging itself works, as I said, rmmod sata_sil24 is logged. So, sorry,
> your patches did NOT break logging.
> 
> Martin
> 
> 
>>
>> Best Regards,
>> Huang Ying
>>
>>> I did not test the xHCI port behavior with any of your three patches because I have
>>> disabled USB support in this kernel altogether for 3.9-rc1 tests. And I would like to stick
>>> with that until we fix the pciehp issue. I stepped rather late into the big testing game,
>>> I believe the pciehp bug we are facing was not working since 3.5/3.6, I don't think
>>> the 3.9-rc1-based tests be much different from earlier kernels.
>>>
>>> For a broader view, on the 3.8 series we will meanwhile hopefully get to a fix of the
>>> PME# stuff. I think I reported quite a good number of potential problems yesterday.
>>> After that, I will check how xHCI behaves on 3.9 but I believe the PME#-related fix from
>>> 3.8 will be also applicable to fixing 3.9 so the xHCI won't have problems there anymore.
>>>
>>>
>>> Martin
>>
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html