[ +linux-pci and Yinghai as they suffered already those many emails on individual threads so one overviewing email hopefully won't harm] ;-) Martin Mokrejs wrote: > > > Bjorn Helgaas wrote: >> On Tue, Apr 2, 2013 at 9:02 AM, Martin Mokrejs >> <mmokrejs@xxxxxxxxxxxxxxxxxx> wrote: >>> Hi Ying, >>> >>> huang ying wrote: >> >>>> And please give me the full dmesg for boot and incremental dmesg for >>>> operations. >>> >>> >>> The incremental bits here, the full dmesg will send only directly to your email, due to its size. >> >> Is there a bugzilla for this issue? Please attach the complete dmesg >> there or somewhere similar so we can all benefit. > > I changed my mind. I am attaching the dmesg here but omitting linux-acpi > list. After I hear a proposal from Rafel/Bjorn I will open separate bugs. > I thought that the threads I started so far were enough but yes, dmesg > files don't pass through list filters so I should move that to bugzilla. > > so far my view of the the bugs was: > > 1) acpiphp hotplug broken due to upstream pcieport 1c.7 PME# enabled > (eSATA-based card) Fixed by Ying Huang port_dbg.patch applied over 3.8.5 (fixes acpiphp hotplug of eSATA and Firewire cards, NOT the hotplug of a NEC-based USB3 card -> hence the bug 4) below). Now I can continue using laptop-mode-tools. > 2) xHCI dead due to to its suspend - 3.8 series and above Not fixed by port_dbg.patch applied over 3.8.5. Interestingly, a NEC-based XHCI card *in an express card slot* does not suffer this suspend issue. Although it is being put into suspend if a device is unplugged. > 3) pciehp completely broken since about 3.6, still 3.9-rc5 Even 3.9-rc5 with patch 2368081 and port_dbg.patch from Ying Huang this is still broken (the eject of a cold plugged device from an express card slot). That results in /proc/interrupts claiming IRQ19 is still used by the driver. Non-forced but manual 'rmmod sata_sil24' removes the IRQ 19 from the listing. The rmmod also removes association with sata_sil24 from the /proc/iomem but the device 11:00 is retained in the file with its memory ranges. lspci provides, as many times described by me, conflicting information. Actually, I trust more lspci than /proc/ files. > > > > There is one more which actually brought me into all of this in May2012 at about > 3.2.x kernels: > > 4) Even when upstream port 1c.7 is force control to 'on' hot removal of > USB3 express card is broken, only every second eject is recognized. > Is likely related to xhci_hcd having a special privilege to handle IRQ/PM > in its own way. In contrast, Firewire and eSATA cards work under same > circumstances. I see different sleep states listed as supported by those > cards but my bet is that is due to the exceptional xhci_hcd privilege. > I briefly repeated that already with 3.9-rc5. Still broken even with port_dbg.patch applied over 3.8.5. Turns out the unnoticed ejects and inserts are actually detected, but later, with 30sec delay of so. Hmm, in my original thread back in 2012 I said 60sec delay but seems is likely still the same problem: 3.2.11: PCI Express card cannot be re-detected withing cca 60sec timeframe Before I forget, I will sketch several more bugs I hit and are all documented in my postings from last week or two. I can provide the URLs to those postings already in archives and maybe summarize them in bugzilla, after we agree what will be worked on and where (email ... bugzilla), under the best matching suibject you will propose. 5) lspci causes wake and suspend of pcieport handled devices. I fear this is not good. Maybe it does the same to other pci devices but the "problem" is that no other pci drivers report same type of message. I would like to see the PME# enabled/disabled generated by other drivers as well, ideally by some upstream, common driver. 6) sata_sil24 sometimes initializes badly under pciehp. Provided you once fix the pciehp and still would like to get the init of sata_sil24 fixed as well. The are two wrong paths in the driver. One is: [ 899.894862] sata_sil24 0000:11:00.0: version 1.1 [ 899.894880] sata_sil24 0000:11:00.0: enabling device (0000 -> 0003) [ 899.985994] sata_sil24 0000:11:00.0: failed to clear port RST [ 900.086097] sata_sil24 0000:11:00.0: failed to clear port RST [ 900.086119] sata_sil24 0000:11:00.0: enabling bus mastering while the other is: [ 974.021661] pcieport 0000:00:1c.0: PME# disabled [ 974.041697] pcieport 0000:00:1c.7: PME# disabled [ 1048.450168] sata_sil24 0000:11:00.0: version 1.1 [ 1048.463692] sata_sil24 0000:11:00.0: Refused to change power state, currently in D3 [ 1048.563818] sata_sil24 0000:11:00.0: failed to clear port RST [ 1048.663935] sata_sil24 0000:11:00.0: failed to clear port RST Both lead to a broken device and I would prefer the driver to fail to load. It seems they are at least in part related to early device eject while the driver did not yet turn down an unused external SATA port. 7) It seems Rafael or Bjorn have a clue why sometimes I see only PME# disabled or just PME# enabled in dmesg for a particular device and I am worried when was it silently switched to the other state. I would like to hear this can be prevent in future by some cross-checks, by design. 8) I don't know whether one can ensure that a driver releases either both IRQ and memory ranges it has allocated, or just nothing, or an oops happens, whatever. Maybe something could track what the driver grabbed once and make sure both are released. even a background scan or /proc files would be fine. The disagreement with lspci is not good. 9) In the thread Re: 3.8.2: stale pci device info for a previously inserted express card I already showed an example that chimeric entries in 'lspci -vvv' output can appear. Some data describe the previously loaded card in an Express Card Slot while the other the one currently loaded in the slot. This might lead to an explanation why are there those lines in lspci like: a) Latency: 0 Latency: 0, Cache Line Size: 64 bytes or the Latency: line missing altogether b) [virtual] Expansion ROM at f6c00000 [disabled] [size=512K] Expansion ROM at f6c00000 [size=512K] c) Region 0: Memory at f6c84000 (64-bit, non-prefetchable) [size=128] Region 0: Memory at f6c84000 (64-bit, non-prefetchable) [disabled] [size=128] If kernel does not give a hint what is wrong with a device/driver then maybe lspci do do a runtime check and give some more useful user-oriented warning. >> >> I think we have two problems that may be relevant to this discussion. >> >> 1) The _OSC "PCI Express Capability Structure control" bit. I don't >> think Linux pays attention to whether the BIOS has granted us control >> over the capability, so we may do things to it that the BIOS doesn't >> expect. >> >> 2) acpiphp currently uses the presence of _ADR/_EJ0/_RMV to detect >> hotplug slots. I don't think this is sufficient (see >> https://bugzilla.kernel.org/show_bug.cgi?id=54981 for details). >> Therefore, I don't think pci_bus_has_hotplug_slots() in port_dbg.patch >> can be accurate. I think it returns "false" for some buses where it >> should return "true," such as the ExpressCard slot on Chris Clayton's >> system (see bug 54981). > > But, I do not how whether and how to split the above 4 bugs into maybe more, > better described bugs. I will repeat them likely with 3.8.5 and 3.9-rc5, > I got quite skilled running diff all the last days and weeks. ;-) > > I am waiting for some answers from you before opening bug reports. > Please tell me how to name them and what data you want to get where. > After I open them will try to (re)attach your patches. Ying, do you have an > update for the port_dbg.patch per Bjorns comments about the pci_bus_has_hotplug_slots() > being inaccurate? I would gladly wait for an updated patch catching rather > more scenarios than less. Feel free to comment on the listing of deemed bugs, add more you saw in the logs or diffs yourself (especially those downstream, secondary bugs which will be soon masked by the hotplug issues being *fixed*). ;) I am quite optimistic. ;)) The above listings don't contain URLs but can be all sorted out in those respective bugzilla entries. Thank you, Martin -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html