RE: [PATCH 1/2] PCI/MSI: Cache the MSIX table size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Marc Zyngier
> Sent: 22 January 2023 10:57
> 
> On Sun, 22 Jan 2023 09:00:04 +0000,
> Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> >
> > On Thu, Jan 19, 2023 at 07:06:32PM +0200, Alexander Shishkin wrote:
> > > A malicious device can change its MSIX table size between the table
> > > ioremap() and subsequent accesses, resulting in a kernel page fault in
> > > pci_write_msg_msix().
> > >
> > > To avoid this, cache the table size observed at the moment of table
> > > ioremap() and use the cached value. This, however, does not help drivers
> > > that peek at the PCIE_MSIX_FLAGS register directly.
> > >
> > > Signed-off-by: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
> > > Reviewed-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx>
> > > Cc: stable@xxxxxxxxxxxxxxx
> > > ---
> > >  drivers/pci/msi/api.c | 7 ++++++-
> > >  drivers/pci/msi/msi.c | 2 +-
> > >  include/linux/pci.h   | 1 +
> > >  3 files changed, 8 insertions(+), 2 deletions(-)
> >
> > I'm not security expert here, but not sure that this protects from anything.
> > 1. Kernel relies on working and not-malicious HW. There are gazillion ways
> > to cause crashes other than changing MSI-X.
> > 2. Device can report large table size, kernel will cache it and
> > malicious device will reduce it back. It is not handled and will cause
> > to kernel crash too.
> >
> 
> Indeed, this was my exact reaction reading this patch. This only makes
> sure the same (potentially wrong) value is used at all times. So while
> this results in a consistent use, this doesn't give much guarantee.

Yep, the device can 'choose' to error any PCIe read.

> The only way to deal with this is to actually handle the resulting
> fault, similar to what the kernel does when accessing userspace. Not
> sure how possible this is with something like PCIe.

I don't think you get a fault, the PCIe read completes with value ~0.
You might get an AER indication, that may not be helpful at all.
We've some x86 systems where that ends up as an NMI and panic!

A more valid reason for caching the MSIX table size is to avoid
doing a slow PCIe read.
I'm not sure how fast they are on 'normal' hardware, but the Altera
fpga we use is particularly pedestrian.
I just measured back-to-back reads at 126 clocks of the internal
125MHz clock so almost exactly 1us - or 3000 clocks of a 3GHz cpu.
(I added PCIe trace to the fpga so we can see what goes on.)

There is actually a much more 'interesting' issue with MSIX.
There are 16 bytes of data for each interrupt.

The kernel doesn't even try to ensure they are written as
a single PCIe TLP, and even if it did there is no real
guarantee the writes aren't split before the logic that
raises interrupts reads the values.

It is also quite likely that the interrupt raising logic
doesn't to an atomic read of all 16 bytes, so a cpu write
could split the reads.

This doesn't normally matter - the interrupt is enabled long
after the address/data fields are initialised.
But if the interrupt affinity is changed both the address and
data fields are likely to be changed on an interrupt that is
(nominally) enabled.

It is pretty easy to imagine the new address being used with
the old data (or v.v.) or even a 'torn read' of the 64bit
address field.

I don't remember seeing anything in the MSIX spec about
requirements on the hardware - which puts the onus on
the software to ensure the MSIX data is always valid.
This means that changing the vector needs to:
	Disable the interrupt.
	Delay (a read from the MSIX block is probably enough).
	Update the address and data.
	Delay.
	Enable the interrupt.
But I don't remember seeing the kernel to any of that.

	David.

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux