On Mon, Jan 30, 2023 at 03:46:06PM +0000, Niklas Cassel wrote: > On Mon, Jan 30, 2023 at 09:21:11AM -0600, Bjorn Helgaas wrote: > > On Sat, Jan 28, 2023 at 10:39:51AM +0900, Damien Le Moal wrote: > > > PCI passthrough to VMs does not work with AMD FCH AHCI adapters: the > > > guest OS fails to correctly probe devices attached to the controller due > > > to FIS communication failures. > > > > What does a FIS communication failure look like? Can we include a > > line or two of dmesg output here to help users find this fix? > > It looks like this: > > [ 22.402368] ata4: softreset failed (1st FIS failed) > [ 32.417855] ata4: softreset failed (1st FIS failed) > [ 67.441641] ata4: softreset failed (1st FIS failed) > [ 67.453227] ata4: limiting SATA link speed to 3.0 Gbps > [ 72.661738] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 320) > [ 78.121263] ata4.00: qc timeout after 5000 msecs (cmd 0xec) > [ 78.134413] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4) > > Basically, we can read and write MMIO registers in the AHCI HBA, > but the communication between the AHCI HBA and the ATA device does > not work properly. > > (Because the AHCI HBA did not get reset when binding/unbinding the > device.) > > The exact same kernel, using the same generic AHCI driver within the VM, > can communicate perfectly fine when using e.g. an Intel AHCI HBA. > > (With both the AMD and Intel AHCI HBAs being bound to the vfio-pci driver > in the host.) > > We can send a v2 with the above dmesg output. Don't bother, I added the above and applied this to pci/virtualization for v6.2, thanks! > > AMD folks: Can you confirm/deny that this is a hardware erratum in > > this device? Do you know of any other devices that need a similar > > workaround? > > > > > Forcing the "bus" reset method before > > > unbinding & binding the adapter to the vfio-pci driver solves this > > > issue. I.e.: > > > > > > echo "bus" > /sys/bus/pci/devices/<ID>/reset_method > > > > > > gives a working guest OS, thus indicating that the default flr reset > > > method is defective, resulting in the adapter not being reset correctly. > > > > > > This patch applies the no_flr quirk to AMD FCH AHCI devices to > > > permanently solve this issue. > > > > > > Reported-by: Niklas Cassel <niklas.cassel@xxxxxxx> > > > Cc: stable@xxxxxxxxxxxxxxx > > > Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> > > > --- > > > drivers/pci/quirks.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > > index 285acc4aaccc..20ac67d59034 100644 > > > --- a/drivers/pci/quirks.c > > > +++ b/drivers/pci/quirks.c > > > @@ -5340,6 +5340,7 @@ static void quirk_no_flr(struct pci_dev *dev) > > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487, quirk_no_flr); > > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c, quirk_no_flr); > > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c, quirk_no_flr); > > > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x7901, quirk_no_flr); > > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, quirk_no_flr); > > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, quirk_no_flr); > > > > > > -- > > > 2.39.1 > > >