Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
On 12/16/20 9:21 AM, Daniel Vetter wrote:
On Wed, Dec 16, 2020 at 9:04 AM Christian König
<ckoenig.leichtzumerken@xxxxxxxxx> wrote:
Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
[SNIP]
While we can't control user application accesses to the mapped
buffers explicitly and hence we use page fault rerouting
I am thinking that in this case we may be able to sprinkle
drm_dev_enter/exit in any such sensitive place were we might
CPU access a DMA buffer from the kernel ?
Yes, I fear we are going to need that.
Things like CPU page table updates, ring buffer accesses and FW
memcpy ? Is there other places ?
Puh, good question. I have no idea.
Another point is that at this point the driver shouldn't access any
such buffers as we are at the process finishing the device.
AFAIK there is no page fault mechanism for kernel mappings so I
don't think there is anything else to do ?
Well there is a page fault handler for kernel mappings, but that one
just prints the stack trace into the system log and calls BUG(); :)
Long story short we need to avoid any access to released pages after
unplug. No matter if it's from the kernel or userspace.
I was just about to start guarding with drm_dev_enter/exit CPU
accesses from kernel to GTT ot VRAM buffers but then i looked more in
the code
and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
sake of device to main memory access). Kernel page table is not
touched
until last bo refcount is dropped and the bo is released
(ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
is both
for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
by ioremap. So as i see it, nothing will bad will happen after we
unpopulate a BO while we still try to use a kernel mapping for it,
system memory pages backing GTT BOs are still mapped and not freed and
for
VRAM BOs same is for the IO physical ranges mapped into the kernel
page table since iounmap wasn't called yet.
The problem is the system pages would be freed and if we kernel driver
still happily write to them we are pretty much busted because we write
to freed up memory.
OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will
release
the GTT BO pages. But then isn't there a problem in ttm_bo_release since
ttm_bo_cleanup_memtype_use which also leads to pages release comes
before bo->destroy which unmaps the pages from kernel page table ? Won't
we have end up writing to freed memory in this time interval ? Don't we
need to postpone pages freeing to after kernel page table unmapping ?
BOs are only destroyed when there is a guarantee that nobody is
accessing them any more.
The problem here is that the pages as well as the VRAM can be
immediately reused after the hotplug event.
Similar for vram, if this is actual hotunplug and then replug, there's
going to be a different device behind the same mmio bar range most
likely (the higher bridges all this have the same windows assigned),
No idea how this actually works but if we haven't called iounmap yet
doesn't it mean that those physical ranges that are still mapped into
page
table should be reserved and cannot be reused for another
device ? As a guess, maybe another subrange from the higher bridge's
total
range will be allocated.
Nope, the PCIe subsystem doesn't care about any ioremap still active for
a range when it is hotplugged.
and that's bad news if we keep using it for current drivers. So we
really have to point all these cpu ptes to some other place.
We can't just unmap it without syncing against any in kernel accesses
to those buffers
and since page faulting technique we use for user mapped buffers seems
to not be possible
for kernel mapped buffers I am not sure how to do it gracefully...
We could try to replace the kmap with a dummy page under the hood, but
that is extremely tricky.
Especially since BOs which are just 1 page in size could point to the
linear mapping directly.
Christian.
Andrey
-Daniel
Christian.
I loaded the driver with vm_update_mode=3
meaning all VM updates done using CPU and hasn't seen any OOPs after
removing the device. I guess i can test it more by allocating GTT and
VRAM BOs
and trying to read/write to them after device is removed.
Andrey
Regards,
Christian.
Andrey
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C6ee2a428d88a4742f45a08d8a1cde9c7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437253067654506%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WRL2smY7iemgZdlH3taUZCoa8h%2BuaKD1Hv0tbHUclAQ%3D&reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx