[Linaro-mm-sig] [PATCH 1/5] dma-buf: add optional invalidate_mappings callback v2

ckoenig.leichtzumerken@xxxxxxxxx (Christian König) · Thu, 22 Mar 2018 10:37:55 +0100

Am 22.03.2018 um 08:14 schrieb Daniel Vetter:
> On Wed, Mar 21, 2018 at 10:34:05AM +0100, Christian KÃ¶nig wrote:
>> Am 21.03.2018 um 09:18 schrieb Daniel Vetter:
>>> [SNIP]
>> For correct operation you always need to implement invalidate_range_end as
>> well and add some lock/completion work Otherwise get_user_pages() can again
>> grab the reference to the wrong page.
> Is this really a problem?

Yes, and quite a big one.

> I figured that if a mmu_notifier invalidation is
> going on, a get_user_pages on that mm from anywhere else (whether i915 or
> anyone really) will serialize with the ongoing invalidate?

No, that isn't correct. Jerome can probably better explain that than I do.

> If that's not the case, then really any get_user_pages is racy, including all the
> DIRECT_IO ones.

The key point here is that get_user_pages() grabs a reference to the 
page. So what you get is a bunch of pages which where mapped at that 
location at a specific point in time.

There is no guarantee that after get_user_pages() return you still have 
the same pages mapped at that point, you only guarantee that the pages 
are not reused for something else.

That is perfectly sufficient for a task like DIRECT_IO where you can 
only have block or network I/O, but unfortunately not really for GPUs 
where you crunch of results, write them back to pages and actually count 
on that the CPU sees the result in the right place.

>> [SNIP]
>> So no matter how you put it i915 is clearly doing something wrong here :)
> tbh I'm not entirely clear on the reasons why this works, but
> cross-release lockdep catches these things, and it did not complain.
> On a high-level we make sure that mm locks needed by get_user_pages do
> _not_ nest within dev->struct_mutex. We have massive back-off slowpaths to
> do anything that could fault outside of our own main gem locking.

I'm pretty sure that this doesn't work as intended and just hides the 
real problem.

> That was (at least in the past) a major difference with amdgpu, which
> essentially has none of these paths. That would trivially deadlock with
> your own gem mmap fault handler, so you had (maybe that changed) a dumb
> retry loop, which did shut up lockdep but didn't fix any of the locking
> inversions.

Any lock you grab in an MMU callback can't be even held when you call 
kmalloc() or get_free_page() (without GFP_NOIO).

Even simple things like drm_vm_open() violate that by using GFP_KERNEL. 
So I can 100% ensure you that what you do here is not correct.

> So yeah, grabbing dev->struct_mutex is in principle totally fine while
> holding all kinds of struct mm/vma locks. I'm not entirely clear why we
> punt the actual unmapping to the worker though, maybe simply to not have a
> constrained stack.

I strongly disagree on that. As far as I can see what TTM does looks 
actually like the right approach to the problem.

> This is re: your statement that you can't unamp sg tables from the
> shrinker. We can, because we've actually untangled the locking depencies
> so that you can fully operate on gem objects from within mm/vma locks.
> Maybe code has changed, but last time I looked at radeon/ttm a while back
> that was totally not the case, and if you don't do all this work then yes
> you'll deadlock.
>
> Doen't mean it's not impossible, because we've done it :-)

And I'm pretty sure you didn't do it correctly :D

> Well, it actually gets the job done. We'd need to at least get to
> per-object locking, and probably even then we'd need to rewrite the code a
> lot. But please note that this here is only to avoid the GFP_NOIO
> constraint, all the other bits I clarified around why we don't actually
> have circular locking (because the entire hierarchy is inverted for us)
> still hold even if you would only trylock here.

Well you reversed your allocation and mmap_sem lock which avoids the 
lock inversion during page faults, but it doesn't help you at all with 
the MMU notifier and shrinker because then there are a lot more locks 
involved.

Regards,
Christian.

> Aside: Given that yesterday a bunch of folks complained on #dri-devel that
> amdgpu prematurely OOMs compared to i915, and that we've switched from a
> simple trylock to this nastiness to be able to recover from more low
> memory situation it's maybe not such a silly idea. Horrible, but not silly
> because actually necessary.
> -Daniel