Re: [PATCH kvmtool] pci: Disable writes to Status register

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 13, 2022 at 02:52:44PM +0100, Alexandru Elisei wrote:
> Hi,
> 
> On Thu, Sep 08, 2022 at 03:42:09PM +0100, Jean-Philippe Brucker wrote:
> > Although the PCI Status register only contains read-only and
> > write-1-to-clear bits, we currently keep anything written there, which
> > can confuse a guest.
> > 
> > The problem was highlighted by recent Linux commit 6cd514e58f12 ("PCI:
> > Clear PCI_STATUS when setting up device"), which unconditionally writes
> > 0xffff to the Status register in order to clear pending errors. Then the
> > EDAC driver sees the parity status bits set and attempts to clear them
> > by writing 0xc100, which in turn clears the Capabilities List bit.
> > Later on, when the virtio-pci driver starts probing, it assumes due to
> > missing capabilities that the device is using the legacy transport, and
> > fails to setup the device because of mismatched protocol.
> > 
> > Filter writes to the config space, keeping only those to writable
> > fields. Tighten the access size check while we're at it, to prevent
> > overflow. This is only a small step in the right direction, not a
> > foolproof solution, because a guest could still write both Command and
> > Status registers using a single 32-bit write. More work is needed for:
> > * Supporting arbitrary sized writes.
> > * Sanitizing accesses to capabilities, which are device-specific.
> > * Fine-grained filtering of the Command register, where only some bits
> >   are writable.
> 
> I'm confused here. Why not do value &= mask to keep only those bits that
> writable?

Sure, I can add it

> 
> > 
> > Also remove the old hack that filtered accesses. It was wrong and not
> > properly explained in the git history, but whatever it was guarding
> > against should be prevented by these new checks.
> 
> If I remember correctly, that was guarding against the guest kernel poking
> the ROM base address register for drivers that assumed that the ROM was
> always there, I vaguely remember that was the case with GPUs. Pairs with
> the similar check in the vfio callback, vfio_pci_cfg_write().

Right, makes sense. I think that's what I assumed when rewriting
pci__config_wr() hence the current comment but the original commits didn't
say anything about it.

> 
> > 
> > Reported-by: Pierre Gondois <pierre.gondois@xxxxxxx>
> > Signed-off-by: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
> > ---
> > Note that the issue described here only shows up during ACPI boot for
> > me, because edac_init() happens after PCI enumeration. With DT boot,
> > edac_pci_clear_parity_errors() runs earlier and doesn't find any device.
> > ---
> >  pci.c | 41 ++++++++++++++++++++++++++++++++---------
> >  1 file changed, 32 insertions(+), 9 deletions(-)
> > 
> > diff --git a/pci.c b/pci.c
> > index a769ae27..84dc7d1d 100644
> > --- a/pci.c
> > +++ b/pci.c
> > @@ -350,6 +350,24 @@ static void pci_config_bar_wr(struct kvm *kvm,
> >  	pci_activate_bar_regions(kvm, old_addr, bar_size);
> >  }
> >  
> > +/*
> > + * Bits that are writable in the config space header.
> > + * Write-1-to-clear Status bits are missing since we never set them.
> > + */
> > +static const u8 pci_config_writable[PCI_STD_HEADER_SIZEOF] = {
> > +	[PCI_COMMAND] =
> > +		PCI_COMMAND_IO |
> > +		PCI_COMMAND_MEMORY |
> > +		PCI_COMMAND_MASTER |
> > +		PCI_COMMAND_PARITY,
> > +	[PCI_COMMAND + 1] =
> > +		(PCI_COMMAND_SERR |
> > +		 PCI_COMMAND_INTX_DISABLE) >> 8,
> > +	[PCI_INTERRUPT_LINE] = 0xff,
> > +	[PCI_BASE_ADDRESS_0 ... PCI_BASE_ADDRESS_5 + 3] = 0xff,
> > +	[PCI_CACHE_LINE_SIZE] = 0xff,
> > +};
> > +
> >  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
> >  {
> >  	void *base;
> > @@ -357,7 +375,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
> >  	u16 offset;
> >  	struct pci_device_header *pci_hdr;
> >  	u8 dev_num = addr.device_number;
> > -	u32 value = 0;
> > +	u32 value = 0, mask = 0;
> >  
> >  	if (!pci_device_exists(addr.bus_number, dev_num, 0))
> >  		return;
> > @@ -368,12 +386,12 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
> >  	if (pci_hdr->cfg_ops.write)
> >  		pci_hdr->cfg_ops.write(kvm, pci_hdr, offset, data, size);
> >  
> > -	/*
> > -	 * legacy hack: ignore writes to uninitialized regions (e.g. ROM BAR).
> > -	 * Not very nice but has been working so far.
> > -	 */
> > -	if (*(u32 *)(base + offset) == 0)
> > -		return;
> > +	/* We don't sanity-check capabilities for the moment */
> > +	if (offset < PCI_STD_HEADER_SIZEOF) {
> > +		memcpy(&mask, pci_config_writable + offset, size);
> > +		if (!mask)
> > +			return;
> 
> Shouldn't this be performed before the VFIO callbacks?

Yes I think I can move it up

> Also, the vfio callbacks still do the writes to the VFIO in-kernel PCI
> header, but now kvmtool would skip those writes entirely. Shouldn't
> kvmtool's view of the configuration space be identical to that of VFIO?

VFIO also skips writes to read-only fields, so they should now be more in
sync than before :) But their views are already desynchronized, because
kvmtool doesn't read back the config space virtualized by VFIO after
writing to it. We should probably improve it, but that's also for a future
patch.

> 
> > +	}
> >  
> >  	if (offset == PCI_COMMAND) {
> >  		memcpy(&value, data, size);
> > @@ -419,8 +437,13 @@ static void pci_config_mmio_access(struct kvm_cpu *vcpu, u64 addr, u8 *data,
> >  	cfg_addr.w		= (u32)addr;
> >  	cfg_addr.enable_bit	= 1;
> >  
> > -	if (len > 4)
> > -		len = 4;
> > +	/*
> > +	 * "Root Complex implementations are not required to support the
> > +	 * generation of Configuration Requests from accesses that cross DW
> > +	 * [4 bytes] boundaries."
> > +	 */
> > +	if ((addr & 3) + len > 4)
> > +		return;
> 
> Isn't that a change in behaviour?

Yes, but it should be safe to change since it is implementation defined.
According to the spec 64-bit config space writes through ECAM are not
expected to work and I find the old behaviour, truncating the write, worse
than rejecting the whole thing. It looks like at least linux, freebsd,
u-boot and edk2 don't issue 64-bit writes.

Thanks,
Jean

> How about:
> 
>     len = 4 - (addr & 3);
> 
> Which should conform to the spec, but still allow writes like before.
> 
> Thanks,
> Alex
> 
> >  
> >  	if (is_write)
> >  		pci__config_wr(kvm, cfg_addr, data, len);
> > -- 
> > 2.37.3
> > 



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux