Am 22.03.2018 um 08:14 schrieb Daniel Vetter: > On Wed, Mar 21, 2018 at 10:34:05AM +0100, Christian König wrote: >> Am 21.03.2018 um 09:18 schrieb Daniel Vetter: >>> [SNIP] >> For correct operation you always need to implement invalidate_range_end as >> well and add some lock/completion work Otherwise get_user_pages() can again >> grab the reference to the wrong page. > Is this really a problem? Yes, and quite a big one. > I figured that if a mmu_notifier invalidation is > going on, a get_user_pages on that mm from anywhere else (whether i915 or > anyone really) will serialize with the ongoing invalidate? No, that isn't correct. Jerome can probably better explain that than I do. > If that's not the case, then really any get_user_pages is racy, including all the > DIRECT_IO ones. The key point here is that get_user_pages() grabs a reference to the page. So what you get is a bunch of pages which where mapped at that location at a specific point in time. There is no guarantee that after get_user_pages() return you still have the same pages mapped at that point, you only guarantee that the pages are not reused for something else. That is perfectly sufficient for a task like DIRECT_IO where you can only have block or network I/O, but unfortunately not really for GPUs where you crunch of results, write them back to pages and actually count on that the CPU sees the result in the right place. >> [SNIP] >> So no matter how you put it i915 is clearly doing something wrong here :) > tbh I'm not entirely clear on the reasons why this works, but > cross-release lockdep catches these things, and it did not complain. > On a high-level we make sure that mm locks needed by get_user_pages do > _not_ nest within dev->struct_mutex. We have massive back-off slowpaths to > do anything that could fault outside of our own main gem locking. I'm pretty sure that this doesn't work as intended and just hides the real problem. > That was (at least in the past) a major difference with amdgpu, which > essentially has none of these paths. That would trivially deadlock with > your own gem mmap fault handler, so you had (maybe that changed) a dumb > retry loop, which did shut up lockdep but didn't fix any of the locking > inversions. Any lock you grab in an MMU callback can't be even held when you call kmalloc() or get_free_page() (without GFP_NOIO). Even simple things like drm_vm_open() violate that by using GFP_KERNEL. So I can 100% ensure you that what you do here is not correct. > So yeah, grabbing dev->struct_mutex is in principle totally fine while > holding all kinds of struct mm/vma locks. I'm not entirely clear why we > punt the actual unmapping to the worker though, maybe simply to not have a > constrained stack. I strongly disagree on that. As far as I can see what TTM does looks actually like the right approach to the problem. > This is re: your statement that you can't unamp sg tables from the > shrinker. We can, because we've actually untangled the locking depencies > so that you can fully operate on gem objects from within mm/vma locks. > Maybe code has changed, but last time I looked at radeon/ttm a while back > that was totally not the case, and if you don't do all this work then yes > you'll deadlock. > > Doen't mean it's not impossible, because we've done it :-) And I'm pretty sure you didn't do it correctly :D > Well, it actually gets the job done. We'd need to at least get to > per-object locking, and probably even then we'd need to rewrite the code a > lot. But please note that this here is only to avoid the GFP_NOIO > constraint, all the other bits I clarified around why we don't actually > have circular locking (because the entire hierarchy is inverted for us) > still hold even if you would only trylock here. Well you reversed your allocation and mmap_sem lock which avoids the lock inversion during page faults, but it doesn't help you at all with the MMU notifier and shrinker because then there are a lot more locks involved. Regards, Christian. > Aside: Given that yesterday a bunch of folks complained on #dri-devel that > amdgpu prematurely OOMs compared to i915, and that we've switched from a > simple trylock to this nastiness to be able to recover from more low > memory situation it's maybe not such a silly idea. Horrible, but not silly > because actually necessary. > -Daniel