On Sat, 14 May 2022 11:14:00 +0200 Lukas Wunner <lukas@xxxxxxxxx> wrote: > On Fri, May 13, 2022 at 06:57:29PM +0200, Pali Rohár wrote: > > To answer your questions: Config space of Aardvark Root Port does not > > conform to PCIe base spec. It does not implement DLLLARC, nor DLLSCE and > > lot of other bits. Plus it has Type 0 header (not Type 1). And due to > > these reasons, pci-aardvark.c driver implements "emulation" of the > > Root Port and implements some of the functionality via custom aardvark > > registers. So Root Port would be presented to kernel and also to > > userspace as PCI Bridge device with Type 1 header and with PCIe > > registers required by linux kernel. > > > > During my testing of kernel hotplug code last year, I figured out that > > kernel was waiting for event which never happened. And so it was needed > > to "fix" kernel to not try to enable DLLSCE because it did nothing. > > Could you please reproduce this and add the following on the command line: > > log_buf_len=10M pciehp.pciehp_debug=1 dyndbg="file pciehp* +p" > ignore_loglevel > > Then open a bug at bugzilla.kernel.org, attach full dmesg output > as well as full "lspci -vv" output and send the bugzilla link to me. > > (Obviously, revert patches 6 and 7 when trying to reproduce it.) > > So a PDC event should be sufficient to bring the slot up or down, > a DLLSC event should not be necessary. Enabling DLLSC should not > make a difference on a controller which doesn't support it. > I just double-checked the code and I do not see where we'd wait > for a DLLSC event which never comes. > > Don't worry, if we come to the conclusion that your proposed fix > is the only solution, I'm fine with it, but at this point I'd > like to get a better understanding what is really going on. > Perhaps there is some other issue in pciehp that this patch > just papers over. Once you provide the dmesg debug output > I'll be able to analyze that. Dear Lukas, we have tried to reproduce the bug where kernel was waiting for an event which never happend, the bug that Pali remembered from his work on the pciehp code. We have concluded that it doesn't concert the DLLSC patch (06/18), only the Command Completed Interrupt patch (07/18), and even there it seems that the patch is not needed to trigger the bug: it seems that when Pali was studying the bug, he did two things: 1. he made enabling Command Completed Interrupt conditional on NCCS bit not set 2. he made the aardvark driver report NCCS bit via emulated bridge. It turns out that only the second thing is needed, since the pciehp code checks NCCS bit in pcie_wait_cmd() and does not wait for the interrupt if NCCS is set. Anyway we still think that both patches make sense, at least so that an interrupt isn't reported as enabled and not supported at once when dumping the configuration space. So I will resend these patches with updated commit messages. Marek