Re: [PATCH 0/5] drm/vmwgfx: Support module unload and hotunplug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 08.10.21 um 23:13 schrieb Thomas Hellström:
On Fri, 2021-10-08 at 20:40 +0000, Zack Rusin wrote:
On Fri, 2021-10-08 at 22:28 +0200, Thomas Hellström wrote:
On Fri, 2021-10-08 at 13:31 -0400, Zack Rusin wrote:
This is a largely trivial set that makes vmwgfx support module
reload
and PCI hot-unplug. It also makes IGT's core_hotunplug pass
instead
of kernel oops'ing.

The one "ugly" change is the "Introduce a new placement for MOB
page
tables". It seems vmwgfx has been violating a TTM assumption that
TTM_PL_SYSTEM buffers are never fenced for a while. Apart from a
kernel
oops on module unload it didn't seem to wreak too much havoc, but
we
shouldn't be abusing TTM. So to solve it we're introducing a new
placement, which is basically system, but can deal with fenced
bo's.

Cc: Christian König <christian.koenig@xxxxxxx>
Cc: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx>
Hi, Zack,

What part of TTM doesn't allow fenced system memory currently? It
was
certainly designed to allow that and vmwgfx has been relying on
that
since the introduction of MOBs IIRC. Also i915 is currently relying
on
that.
It's the shutdown. BO's allocated through the ttm system manager
might
be busy during ttm_bo_put which results in them being scheduled for a
delayed deletion. The ttm system manager is disabled before the final
delayed deletion is ran in ttm_device_fini. This results in crashes
during freeing of the BO resources because they're trying to remove
themselves from a no longer existent ttm_resource_manager (e.g. in
IGT's core_hotunplug on vmwgfx)

During review of the trivial patch that was fixing it in ttm
Christian
said that system domain buffers must be idle or otherwise a number of
assumptions in ttm breaks:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Fdri-devel%2F2021-September%2F324027.html&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C2391a82208e6464c8db208d98aa08dd2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637693244449717755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=BZ3C00rZDDdpKNoGa0PYwoHeM89uVzN1Md4iN2qkGB0%3D&amp;reserved=0
And later clarified that in fact system domain buffers being fenced
is
illegal from a design point of view:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Fdri-devel%2F2021-September%2F324697.html&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C2391a82208e6464c8db208d98aa08dd2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637693244449717755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=3eXNqeh7Ifqe6lllRMvdfJX%2F7rX7%2FqH3wldNE5AodMc%3D&amp;reserved=0
Hmm, this looks very odd, because I remember reminding Christian as
late as this spring that both vmwgfx and i915 sets up GPU bindings to
system buffers, as part of the review of a patch series pushing a
couple of changes to the swapout path that apparently had missed this.

Well that was the trigger to look more deeply into this and as far as I can tell TTM was never capable of doing this correctly.

This more sounds like there have been changes to TTM happening not
taking into account or knowing that TTM was designed for system buffers
bound to GPU and that there were drivers actually doing that.

And there is still older code around clearly implying system buffers
can be fenced, like ttm_bo_swapout(), and that there is dma fences
signaling completion on work that has never touched the GPU, not to
mention async eviction where a bo may be evicted to system but has tons
of outstanding fenced work in the pipe.

So if there has been a design change WRT this I believe it should have
been brought up on dri-devel to have it properly discussed so affected
drivers could agree on the different options.

Perhaps Christian can enlighten us here. Christian?

There are multiple occasions where we assume that BOs in the system domain are not accessible by the GPU, swapout and teardown are just two examples.

When Dave reorganized the buffer move code we also had to insert waits for moves to complete for anything which goes into the SYSTEM domain.

Apart from that it certainly makes sense to have that restriction. Memory which is accessed by the hardware and not directly evictable must be accounted differently.

Regards,
Christian.


/Thomas


z





[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux