On Thursday 16 September 2021 09:32:57 Bjorn Helgaas wrote: > On Wed, Sep 15, 2021 at 12:55:53PM +0200, Pali Rohár wrote: > > On Tuesday 14 September 2021 15:55:26 Bjorn Helgaas wrote: > > > > It is illegal for a device to return CRS after it has returned a > > > successful completion unless an intervening reset has occurred, so > > > drivers and other code should never see it. > > > > > > > And issue is there also with write requests. Is somebody checking return > > > > value of pci_bus_write_config function? > > > > > > Similar case here. The enumeration and wait-after-reset paths always > > > do *reads* until we get a successful completion, so I don't think we > > > ever issue a write that can get CRS. > > > > Yes, in normal conditions we should not see it. > > > > But for testing purposes (that emulated bridge works fine) I'm using > > setpci for changing some configuration. > > > > And via setpci it is possible to turn off CRSSVE bit in which case then > > Root Complex should re-issue request again. > > > > I'm not sure how "legal" it is if userspace / setpci changes some of > > these bits. At least on a hardware with a real Root Port device it > > should be fully transparent. As hardware handles this re-issue and > > kernel then would see (reissued) response. > > If setpci changes bits like these, all bets are off. We can't tell > what happened, so we can't rely on any configuration Linux did. I > think we really should taint the kernel when this happens. For testing purposes, setpci is still a very good tool. > > Test case: Initialize device, then unbind it from sysfs, reset it (hot > > reset or warm reset) and then rescan / reinit it again. Here device is > > permitted to send CRS response. > > > > We know that more PCIe cards are buggy and sometimes firmware on cards > > crashes or resets card logic. Which may put card into initialization > > state when it is again permitted to send CRS response. > > Yep. That's a buggy device and normally we would work around it with > a quirk. This particular kind of bug would be hard to work around, > but a host bridge driver doesn't seem like the right place to do it > because we'd have to do it in *every* such driver. This described firmware crashing & card reset logic I saw in more wifi cards. Sometimes even wifi drivers itself detects that card does not respond and do some its own internal card reset (e.g. iwldvm on laptop). So it very common situation. But I have not seen that these cards on laptop issue CRS response. Maybe because their firmware or PCIe logic bootup too fast (so there is a very little window for CRS response) or because CRS response sent to OS did not cause any issue. So no particular workaround is needed for above described scenario. But anyway, in case that in future there would be need for disabling CRS feature in kernel (e.g. for doing some workaround for endpoint or extended pcie switch) then this re-issuing of config request on CRS response in pci-aardvark.c would be needed to have similar behavior like real HW hen CRS is disabled. And I like the idea if driver is "feature complete" and prepared also for other _valid_ code paths. This is just my opinion.