Although the PCI Status register only contains read-only and write-1-to-clear bits, we currently keep anything written there, which can confuse a guest. The problem was highlighted by recent Linux commit 6cd514e58f12 ("PCI: Clear PCI_STATUS when setting up device"), which unconditionally writes 0xffff to the Status register in order to clear pending errors. Then the EDAC driver sees the parity status bits set and attempts to clear them by writing 0xc100, which in turn clears the Capabilities List bit. Later on, when the virtio-pci driver starts probing, it assumes due to missing capabilities that the device is using the legacy transport, and fails to setup the device because of mismatched protocol. Filter writes to the config space, keeping only those to writable fields. Tighten the access size check while we're at it, to prevent overflow. This is only a small step in the right direction, not a foolproof solution, because a guest could still write both Command and Status registers using a single 32-bit write. More work is needed for: * Supporting arbitrary sized writes. * Sanitizing accesses to capabilities, which are device-specific. * Fine-grained filtering of the Command register, where only some bits are writable. Also remove the old hack that filtered accesses. It was wrong and not properly explained in the git history, but whatever it was guarding against should be prevented by these new checks. Reported-by: Pierre Gondois <pierre.gondois@xxxxxxx> Signed-off-by: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx> --- Note that the issue described here only shows up during ACPI boot for me, because edac_init() happens after PCI enumeration. With DT boot, edac_pci_clear_parity_errors() runs earlier and doesn't find any device. --- pci.c | 41 ++++++++++++++++++++++++++++++++--------- 1 file changed, 32 insertions(+), 9 deletions(-) diff --git a/pci.c b/pci.c index a769ae27..84dc7d1d 100644 --- a/pci.c +++ b/pci.c @@ -350,6 +350,24 @@ static void pci_config_bar_wr(struct kvm *kvm, pci_activate_bar_regions(kvm, old_addr, bar_size); } +/* + * Bits that are writable in the config space header. + * Write-1-to-clear Status bits are missing since we never set them. + */ +static const u8 pci_config_writable[PCI_STD_HEADER_SIZEOF] = { + [PCI_COMMAND] = + PCI_COMMAND_IO | + PCI_COMMAND_MEMORY | + PCI_COMMAND_MASTER | + PCI_COMMAND_PARITY, + [PCI_COMMAND + 1] = + (PCI_COMMAND_SERR | + PCI_COMMAND_INTX_DISABLE) >> 8, + [PCI_INTERRUPT_LINE] = 0xff, + [PCI_BASE_ADDRESS_0 ... PCI_BASE_ADDRESS_5 + 3] = 0xff, + [PCI_CACHE_LINE_SIZE] = 0xff, +}; + void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size) { void *base; @@ -357,7 +375,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, u16 offset; struct pci_device_header *pci_hdr; u8 dev_num = addr.device_number; - u32 value = 0; + u32 value = 0, mask = 0; if (!pci_device_exists(addr.bus_number, dev_num, 0)) return; @@ -368,12 +386,12 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, if (pci_hdr->cfg_ops.write) pci_hdr->cfg_ops.write(kvm, pci_hdr, offset, data, size); - /* - * legacy hack: ignore writes to uninitialized regions (e.g. ROM BAR). - * Not very nice but has been working so far. - */ - if (*(u32 *)(base + offset) == 0) - return; + /* We don't sanity-check capabilities for the moment */ + if (offset < PCI_STD_HEADER_SIZEOF) { + memcpy(&mask, pci_config_writable + offset, size); + if (!mask) + return; + } if (offset == PCI_COMMAND) { memcpy(&value, data, size); @@ -419,8 +437,13 @@ static void pci_config_mmio_access(struct kvm_cpu *vcpu, u64 addr, u8 *data, cfg_addr.w = (u32)addr; cfg_addr.enable_bit = 1; - if (len > 4) - len = 4; + /* + * "Root Complex implementations are not required to support the + * generation of Configuration Requests from accesses that cross DW + * [4 bytes] boundaries." + */ + if ((addr & 3) + len > 4) + return; if (is_write) pci__config_wr(kvm, cfg_addr, data, len); -- 2.37.3