On 2/5/21 12:59 PM, Daniel Vetter wrote:
On Fri, Feb 5, 2021 at 5:08 PM Andrey Grodzovsky
<Andrey.Grodzovsky@xxxxxxx> wrote:
+ Daniel
On 2/5/21 10:38 AM, Bjorn Helgaas wrote:
On Thu, Feb 04, 2021 at 11:03:10AM -0500, Andrey Grodzovsky wrote:
+ linux-pci mailing list and a bit of extra details bellow.
On 2/2/21 12:51 PM, Andrey Grodzovsky wrote:
Bjorn, Sergey I would also like to use this opportunity to ask you a
question on a related topic - Hot unplug.
I've been working for a while on this (see latest patchset set here
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-January%2F058595.html&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Cb45e7ccc6a9d44746b4608d8c9ffdf6c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481448110995519%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=9tZ1zY3t3jkclPsNB0LEUWiN3CGcjAFGUGgXjV1KK3I%3D&reserved=0).
My question is specifically regarding this patch
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-January%2F058606.html&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Cb45e7ccc6a9d44746b4608d8c9ffdf6c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481448110995519%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PqD048XLtkaAz6dKygr0rRTnFdwH0Y8WYM9JEYQ1zdA%3D&reserved=0
- the idea here is to
prevent any accesses to MMIO address ranges still mapped in kernel page
I think you're talking about a PCI BAR that is mapped into a user
process.
For user mappings, including MMIO mappings, we have a reliable approach where
we invalidate device address space mappings for all user on first sign of device
disconnect
and then on all subsequent page faults from the users accessing those ranges we
insert dummy zero page
into their respective page tables. It's actually the kernel driver, where no
page faulting can be used
such as for user space, I have issues on how to protect from keep accessing
those ranges which already are released
by PCI subsystem and hence can be allocated to another hot plugging device.
Hm what's the access here? At least on the drm side we should be able
to tear everything down when our ->remove callback is called.
Or I might also be missing too much of the thread context.
I mean we could have a bunch of background threads doing things and periodic
timers that poll different stuff (like temperature sensosr monitoring ,e.t.c)
and some stuff just might be in the middle of HW programming just as
the device was extracted
It is impossible to reliably *prevent* MMIO accesses to a BAR on a
PCI device that has been unplugged. There is always a window where
the CPU issues a load/store and the device is unplugged before the
load/store completes.
If a PCIe device is unplugged, an MMIO read to that BAR will complete
on PCIe with an uncorrectable fatal error. When that happens there is
no valid data from the PCIe device, so the PCIe host bridge typically
fabricates ~0 data so the CPU load instruction can complete.
If you want to reliably recover from unplugs like this, I think you
have to check for that ~0 data at the important points, i.e., where
you make decisions based on the data. Of course, ~0 may be valid data
in some cases. You have to use your knowledge of the device
programming model to determine whether ~0 is possible, and if it is,
check in some other way, like another MMIO read, to tell whether the
read succeeded and returned ~0, or failed because of PCIe error and
returned ~0.
Looks like there is a high performance price to pay for this approach if we
protect at every possible junction (e.g. register read/write accessors ), I
tested this by doing 1M read/writes while using drm_dev_enter/drm_dev_exit which
is DRM's RCU based guard against device unplug and even then we hit performance
penalty of 40%. I assume that with actual MMIO read (e.g. pci_device_is_present)
will cause a much larger performance
penalty.
Hm that's surprising much (for the SRCU check), but then checking for
every read/write really is a bit overkill I think.
-Daniel
So i guess there is no magic bullet to this after all and we back to some kind
of gurds around the sensitive stuff such as blocking all GPU SW schedulers,
refusing father direct IB submissions to HW rings, rejecting all IOCTls.
Andrey
Andrey
table by the driver AFTER the device is gone physically and from the
PCI subsystem, post pci_driver.remove
call back execution. This happens because struct device (and struct
drm_device representing the graphic card) are still present because some
user clients which are not aware
of hot removal still hold device file open and hence prevents device
refcount to drop to 0. The solution in this patch is brute force where
we try and find any place we access MMIO
mapped to kernel address space and guard against the write access with a
'device-unplug' flag. This solution is obliviously racy because a device
can be unplugged right after checking the falg.
I had an idea to instead not release but to keep those ranges reserved
even after pci_driver.remove, until DRM device is destroyed when it's
refcount drops to 0 and by this to prevent new devices plugged in
and allocated some of the same MMIO address range to get accidental
writes into their registers.
But, on dri-devel I was advised that this will upset the PCI subsystem
and so best to be avoided but I still would like another opinion from
PCI experts on whether this approach is indeed not possible ?
Andrey