Re: Dell Vostro 3550: pci_hotplug+acpiphp require 'pcie_aspm=force' on kernel command-line for hotplug to work

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Bjorn,
  thank you for your time on this issue.

Bjorn Helgaas wrote:
> On Wed, Jan 9, 2013 at 4:10 PM, Martin Mokrejs
> <mmokrejs@xxxxxxxxxxxxxxxxxx> wrote:
>> Hi,
>>   I am following up on a former thread
>> Re: 3.2.11: PCI Express card cannot be re-detected withing cca 60sec timeframe
>> about the same issue. I think I found some new info while playing with 3.7.1 kernel.
>> It happened to me that my hotplug of express cards stopped working so it made me to
>> to dive in a figure out what driver did I do to my .config, and what combinations
>> of drivers and kernel command-line parameters work and which not. This email will
>> cover just one case.
>>
>> On this Dell Vostro 3550 express card slot works if kernel is without pciehp
>> altogether and pci_hotplug+acpiphp are loaded as modules later on. The problem
>> is that I must use pcie_aspm=off.
> 
> I confess I am completely bewildered here.  Something is clearly badly
> broken, but I'm having a hard time figuring out exactly what it is.  I
> think I'm overwhelmed by all the data :)
> 
>>From your previous "3.2.11: PCI Express card cannot be re-detected
> withing cca 60sec timeframe" thread, I think:
> 
> 1) With pciehp, insertion and removal of an NEC uPD720200 USB3.0 card
> doesn't work correctly [1].  The insertion/removal events don't seem
> to be detected immediately.

... except for the FireWire, serial/parallel and eSATA port-providing cards in 2) and 3).
The one doing SATA is based on Silicon Image3132 chip, using sata_sil24
driver.

> 2) Insertion/removal of firewire card works correctly [2]

Yes.

> 3) Insertion/removal of AXAGO ECA-SP serial/parallel card works correctly [3]

Yes.

> 4) When the xhci driver is not loaded, insertion/removal events of the
> NEC USB3.0 card *are* detected correctly [4]

And my suspicion was that when a USB device is attached to the
ExpressCard USB3.0 controller the "usb-storage?" or whichever driver"
realizes much earlier (in time) that the ExpressCard is gone and system thus
behaves properly. I think something is delayed by xhci driver until a poll
happens after every 60 seconds. With usb-storage the change gets visible
"immediately". Or, the card is not PCI or PCIe hotplug capable [6]?
Could that be the difference in behior?


> 
> The ExpressCard slot is below the 00:1c.7 root port, and this port
> supports native PCIe hotplug.  When CONFIG_HOTPLUG_PCI_PCIE=y, Linux
> requests control over PCIe native hotplug, and I think your BIOS
> grants it.
> 
> In this thread ("Dell Vostro 3550: pci_hotplug+acpiphp require
> 'pcie_aspm=force' on kernel command-line for hotplug to work"), I
> think you are saying that if you disable pciehp and use acpiphp and
> the "pcie_aspm=off" parameter, the ExpressCard slot works perfectly.
> 
> But I think it's a bad idea to go down the road of using acpiphp.

Unfortunately I was forced to switch to acpiphp because commit
0d52f54e2ef64c189dedc332e680b2eb4a34590a (as diagnosed by Yinghai)
becuase pciehp stopped working (although it worked badly anyways).
The commit went in about 3.5 kernel. For me, practially I had to switch
from "pcie_aspm=force" (kernels 3.2 - 3.4) to "pcie_aspm=off" (3.7 for sure).
See the note Yijing in [5].

> Native PCIe hotplug (pciehp) is the default when it is supported, and
> as far as I can tell, it *is* supported on this system.  If we had
> some indication that it's not supported, e.g., if the BIOS declined to
> grant us control over PCIe native hotplug, then of course we would
> fall back to using acpiphp.  But I don't think we do, so we should
> figure out how to make pciehp work.

I do not understand the diffs between PCI hotplug, PCIe hotplug and their
requriements by pciehp versus acpiphp so do not want to comment on that.
I just linked to Yijing's comments in other threads [5, 6] and judge
what is more generic option.

Yighai in the 3.2.x thread postulated it is a BIOS or silicon bug
incorrectly providing PresDet status. I was glad that I can show
that with acpiphp the PresDet is *always* correct (3.7. kernel)
*provided* I disabled MediaCard reader in BIOS.


So although I do not believe into a hardware bug in case of PresDet reporting
I smell there is one in cross-interaction between ExpressCard slot,
EHCI port and MediaCard reader (via EHCI as well), all hooked up to SandyBridge
C6/C200 intel chip. What I saw is that the MediaCard reader gets detected when
an Expresscard is plugged into its slot (that is, much later after a bootup, and
triggered only by express card insertion). I speculate further that the EHCI port
sometimes resets again just due to some ExpressCard interference. I reported such
problems in the past on usb mailing list we could not make any conclusion
out of it except that my external USB hub might be bad. Buying a new
one did not help. Instead, I disabled the MediaCard reader in BIOS.
Note: My motherboard was already exchanged and also the ExpressCard metalic
slot thing.

> 
> If it's really true that pciehp works perfectly except when xhci is
> loaded, that seems like a good clue, and we should look for some
> interaction between xhci and pciehp.

And also, why is presence of Firewire, RS232/LPT and sata_sil24 cards is
reported correctly by pciehp while not NEC-based USB3 card.

The (mis)behavior with the USB3 card was that it's removal was not notified.
On subsequent re-insertion of the card the software status bits were
wrong and pciehp reported Surprise removal. The bits were not adjusted
properly (I am intentionally not saying they should have been reset).
If I again unplugged the card, the PresDet correctly reported the card
was just removed. In other words, every second (even) removal of a card resulted
in correct slot status values reported. On every odd removal, the status
reported by pciehp was wrong.

> 
> Martin, can you confirm my assumptions above or correct any mistakes I made?

Several questions I had were never answered.

Why is the IRQ40 not reported in pciehp (see the original message in thsi thread).
Because it is really not used, compared to pciehp?

Is acpiphp capable reporting Surprise removal like pciehp does or is acpiphp silent
just because it does not print a similar message?

sata_sil24 should not be allowed to bind to newly hotplugged controller
too quickly because sometimes the card slips out immediately. A delay of 6 seconds
would be helpful to prevent eventual Oops, as shown in kernel timings in this email
thread (nicely colored at https://patchwork.kernel.org/patch/1957681/, btw).

Could the CorrErr+ device status bits prevent PresDet change from PresDet+ to PresDet-?

Once I saw the "Changed: MRL. PresDet. LinkState." line in lspci output under SltSta:
reports badly PresDet difference while SltSta: Status: line itself reports correct current
value. How is this value determined?

In the original email in this thread I showed down below from
"And showing, that acpiphp works here while 'pcie_aspm=off':" lien that acpiphp
also reports badly PresDet when the MediaCardReader is enabled in BIOS. I think
acpiphp driver should complain if there are conflicting values. For example, is it
meaningful to see in lspci output the following: "Changed: MRL- PresDet- LinkState+"?
>From "SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-" we know
the card is in the slot so why is the "Changed: line summary wrong?
Why does kernel claim the card has a virtual ROM after hotplug while coldplugged
card reports "Expansion ROM at f6c00000 [disabled] [size=512K]"? Why the differences
in cache line sizes, etc? Simply, why do not both scenarios lead to exactly same card
configuration?



> 
> Bjorn
> 
> 
> [1] http://marc.info/?l=linux-pci&m=133236584826563&w=1
> [2] http://marc.info/?l=linux-pci&m=133457374507707&w=1
> [3] http://marc.info/?l=linux-pci&m=133460527222076&w=1
> [4] http://marc.info/?l=linux-kernel&m=133547823904339&w=1
> 

[5] http://marc.info/?l=linux-kernel&m=135928915003566&w=4
[6] http://marc.info/?l=linux-kernel&m=135937601131152&w=4



I hope I made things a bit clearer.
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux