Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use

Christian König <ckoenig.leichtzumerken@xxxxxxxxx> · Wed, 25 Nov 2020 13:57:40 +0100

Am 25.11.20 um 11:40 schrieb Daniel Vetter:
On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
On 11/24/20 2:41 AM, Christian König wrote:
Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
On 11/23/20 3:41 PM, Christian König wrote:
Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
On 11/23/20 3:20 PM, Christian König wrote:
Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
On 11/25/20 5:42 AM, Christian König wrote:
Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
It's needed to drop iommu backed pages on device unplug
before device's IOMMU group is released.
It would be cleaner if we could do the whole
handling in TTM. I also need to double check
what you are doing with this function.

Christian.

Check patch "drm/amdgpu: Register IOMMU topology
notifier per device." to see
how i use it. I don't see why this should go
into TTM mid-layer - the stuff I do inside
is vendor specific and also I don't think TTM is
explicitly aware of IOMMU ?
Do you mean you prefer the IOMMU notifier to be
registered from within TTM
and then use a hook to call into vendor specific handler ?
No, that is really vendor specific.

What I meant is to have a function like
ttm_resource_manager_evict_all() which you only need
to call and all tt objects are unpopulated.

So instead of this BO list i create and later iterate in
amdgpu from the IOMMU patch you just want to do it
within
TTM with a single function ? Makes much more sense.
Yes, exactly.

The list_empty() checks we have in TTM for the LRU are
actually not the best idea, we should now check the
pin_count instead. This way we could also have a list of the
pinned BOs in TTM.

So from my IOMMU topology handler I will iterate the TTM LRU for
the unpinned BOs and this new function for the pinned ones  ?
It's probably a good idea to combine both iterations into this
new function to cover all the BOs allocated on the device.
Yes, that's what I had in my mind as well.

BTW: Have you thought about what happens when we unpopulate
a BO while we still try to use a kernel mapping for it? That
could have unforeseen consequences.

Are you asking what happens to kmap or vmap style mapped CPU
accesses once we drop all the DMA backing pages for a particular
BO ? Because for user mappings
(mmap) we took care of this with dummy page reroute but indeed
nothing was done for in kernel CPU mappings.
Yes exactly that.

In other words what happens if we free the ring buffer while the
kernel still writes to it?

Christian.

While we can't control user application accesses to the mapped buffers
explicitly and hence we use page fault rerouting
I am thinking that in this  case we may be able to sprinkle
drm_dev_enter/exit in any such sensitive place were we might
CPU access a DMA buffer from the kernel ?
Yes, I fear we are going to need that.
Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
could stuff this into begin/end_cpu_access (but only for the kernel, so a
bit tricky)?

Oh very very good point! I haven't thought about DMA-buf mmaps in this 
context yet.

btw the other issue with dma-buf (and even worse with dma_fence) is
refcounting of the underlying drm_device. I'd expect that all your
callbacks go boom if the dma_buf outlives your drm_device. That part isn't
yet solved in your series here.

Well thinking more about this, it seems to be a another really good 
argument why mapping pages from DMA-bufs into application address space 
directly is a very bad idea :)

But yes, we essentially can't remove the device as long as there is a 
DMA-buf with mappings. No idea how to clean that one up.

Christian.

-Daniel

Things like CPU page table updates, ring buffer accesses and FW memcpy ?
Is there other places ?
Puh, good question. I have no idea.

Another point is that at this point the driver shouldn't access any such
buffers as we are at the process finishing the device.
AFAIK there is no page fault mechanism for kernel mappings so I don't
think there is anything else to do ?
Well there is a page fault handler for kernel mappings, but that one just
prints the stack trace into the system log and calls BUG(); :)

Long story short we need to avoid any access to released pages after unplug.
No matter if it's from the kernel or userspace.

Regards,
Christian.

Andrey

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel