On Tue, 5 Oct 2021 14:28:26 -0500 Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > My claim is that the spec allows root complexes that retry zero times, > so we must assume such a root complex exists and we cannot rely on any > retries. If such a root complex exists, this patch might fix a > problem, but only for aardvark. It would be better to fix the problem > in a way that works for all PCIe controllers. > > I'm playing devil's advocate here, and it's quite possible that I'm > interpreting the spec incorrectly. Maybe the Marvell card is a way to > test this in the real world. > > Bjorn Hello Bjorn, this is what I understand from Pali's explanation, please correct me if something is wrong - If HW supports CRSSVE bit, OS can ask HW to switch from HW-retry to SW-retry mode by setting this CRSSVE bit. - If HW does not support CRSSVE bit, it means that HW supports only HW-retry. - By default CRSSVE is disabled, and it is optional, so HW is required to support HW-retry. - Linux' PCI core supports handling CRSSVE in probe.c: when HW says it supports it, PCI core enables it and retries on 0xffff0001 in function pci_bus_wait_crs(). - Aardvark controller violates specification: it does not support HW-retry even if it is mandatory. Pali is solving this in his patch by doing the retry in the driver when CRSSVE is disabled. He is able to do this because he gets the information about CRS from another channel (another register). - You are talking about wanting to implement an abstraction for what Pali's patch does in PCI core, so that if CRSSVE is not set and someone reads PCI_VENDOR_ID, you want to make PCI core doing this retry. Am I correct here? This could be done by changing Pali's patch so that instead of retrying, the pci_ops->read() method would instead return a value indicating that a retry should be done (this would be a new value, PCIBIOS_CRS), and then in access.c in the pci_bus_read_config_dword() (and pci_user_read_config_dword()), if the pci_ops->read() method returns this PCIBIOS_CRS value, the function will retry reading the register. Is this what you mean? It would make sense to do this, if there are other controllers where HW-retry does not work and instead informs about it via side-channel even when CRSSVE is disabled. Marek PS: Btw, looking at the code, why do we use these PCIBIOS_* macros? And then sometimes convert them to error codes with pcibios_err_to_errno()? Is this some legacy thing? Should this be converted to errno?