On Thursday 17 February 2022 13:59:38 Marek Vasut wrote: > On 2/17/22 12:29, Pali Rohár wrote: > > On Monday 31 January 2022 13:53:41 Pali Rohár wrote: > > > On Saturday 29 January 2022 05:39:40 Marek Vasut wrote: > > > > On 1/24/22 10:37, Pali Rohár wrote: > > > > > On Monday 24 January 2022 06:46:47 Marek Vasut wrote: > > > > > > On 1/23/22 17:49, Pali Rohár wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > I must admit that this patch from its initial version evolved into giant hack... > > > > > > > > > https://lore.kernel.org/linux-pci/20210514200549.431275-1-marek.vasut@xxxxxxxxx/ > > > > > > > > > > > > > > > > > > During review of the previous patch I have asked some important > > > > > > > > > questions but I have not got any answer to them. So I'm reminding it: > > > > > > > > > https://lore.kernel.org/linux-pci/20210805183024.ftdwknkttfwwogks@pali/ > > > > > > > > > > > > > > > > > > So could please answer what happens when PCIe controller is in some > > > > > > > > > non-L* state and either MMIO happen or config read happens or config > > > > > > > > > write happens? > > > > > > > > > > > > > > > > What kind of non-L state ? > > > > > > > > > > > > > > E.g. Hot Reset, Detect, Polling, Configuration or Recovery. > > > > > > > > > > > > > > > Do you have some specific test which fails ? > > > > > > > > > > > > > > Yes, by putting PCIe controller into one of those states. I have already > > > > > > > wrote you in some previous email to trigger hot reset as this is the > > > > > > > easiest test and can be done also by userspace (setpci). > > > > > > > > > > > > > > Link goes to Recovery state automatically when doing link retraining > > > > > > > (e.g. by setting RT bit in PCIe Root Port config space) and from > > > > > > > Recovery to Configuration or directly back to L0. So testing this path > > > > > > > needs precise timing and repeating it more times to trigger. > > > > > > > > > > > > > > So the easiest test is really via PCIe Hot Reset by setting Secondary > > > > > > > Bus Reset bit in Bridge Control register of PCIe Root Port. After this > > > > > > > is link in Hot Reset and does not go back to L0 until you clear that > > > > > > > bit. So in this state you can do all these operations which cause > > > > > > > aborts, like calling that kernel function which is reading from config > > > > > > > space which belongs to device on the other end of the PCIe link or doing > > > > > > > MMIO read / write operation of mapped memory which again belongs to > > > > > > > other end of PCIe link. > > > > > > > > > > > > > > Or instead of Hot Reset, you can set link disable bit in config space of > > > > > > > PCIe Root Port. Then link also would not be in L0 state (until you clear > > > > > > > that bit), so again you have lot of time to do same tests. > > > > > > > > > > > > Can you give me the exact setpci invocation ? If so, then I can test this > > > > > > for you on the hardware. > > > > > > > > > > Call "setpci -s $bdf_root_port BRIDGE_CONTROL" with address of the PCIe > > > > > Root Port device (parent of selected device). This will print value of > > > > > bridge control register. Logical OR it with value 0x20 (Secondary Bus > > > > > Reset Bit) and call "setpci -s $bdf_root_port BRIDGE_CONTROL=$new_value". > > > > > After this call is link in the Hot Reset state and you can do any test. > > > > > To bring link back, call setpci again with cleared 0x20 bit mask. > > > > > > > > > > Similar test you can done also with setting Link Disable bit (bit 4) in > > > > > PCIe Link Control register. Offset to this register is not static and > > > > > you can figure it out from lspci -s $bdf_root_port -vv output. > > > > > Retrain Link is bit 5 in the same register. > > > > > > > > Flipping either bit makes no difference, suspend/resume behaves the same and > > > > the link always recovers. > > > > > > Ok, perfect! And what happens without suspend/resume (just in normal > > > conditions)? E.g. during active usage of some PCIe card (wifi, sata, etc..). > > > > PING? Also what lspci see for the root port and card itself during hot reset? > > If I recall, lspci showed the root port and card. This is suspicious. Card should not respond to config read requests when is in hot reset state. Could you send output of lspci -vvxx of the root port and also card during this test? Maybe it is possible that root port has broken BRIDGE_CONTROL register and did not put card into Hot Reset state?