Quoting Janusz Krzysztofik (2020-02-13 16:28:50) > Hi Michał, > > On Thursday, February 13, 2020 3:37:31 PM CET Michał Winiarski wrote: > > On Thu, Feb 13, 2020 at 01:46:41PM +0100, Janusz Krzysztofik wrote: > > > map-fixed-invalidate* subtests utilize gem_set_tiling() which may fail, > > > e.g. on hardware with no mappable aperture, due to missing fences. > > > Skip those subtests if fences are not available. > > > > > > Moreover, those subtests use GEM_MMAP_GTT IOCTL which may also fail, > > > e.g. on hardware with no mappable aperture. Use GEM_MMAP_OFFSET > > > instead and iterate MMAP_OFFSET coherent types to find one that works. > > > > > > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@xxxxxxxxxxxxxxx> > > > Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx> > > > --- > > > Hi Michał, > > > > > > As you are the author of those subtests, let me ask you a few questions > > > I'm not sure about: > > > 1. How critical is the use of gem_set_tiling() to those subtests? Can > > > we just skip those operations if not supported? If not, can you > > > propose a replacement that should work on hardware with no mappable > > > aperture? > > > > It's a regression test, see: > > e4b946bfe1e3 ("drm/i915: Fix userptr deadlock with aliased GTT mmappings") > > > > The idea was to cause a deadlock by triggerring i915_gem_object_release_mmap > > from set_tiling underneath struct_mutex, which would then cause invalidate on a > > stale userptr (which would also attempt to acquire struct mutex - classic > > recursive lock). Original trace which served as an example to write the IGT: > > > > [23106.066196] [drm:i915_gem_open] > > [23279.022598] INFO: task test_api64:1359 blocked for more than 120 seconds. > > [23279.032453] Tainted: G U 4.1.0-rc7+ #16 > > [23279.039440] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [23279.049191] test_api64 D ffff880146837a18 12760 1359 530 0x00000000 > > [23279.058101] ffff880146837a18 ffff8801468379d8 ffffffff8101efcd ffff880147e22e20 > > [23279.067372] ffff8800a34b8000 ffff8801468379f8 ffff880146838000 00000000ffffffff > > [23279.076599] ffff8801459911b0 0000000000000246 ffff8801459911a8 ffff880146837a38 > > [23279.085923] Call Trace: > > [23279.089510] [<ffffffff8101efcd>] ? native_sched_clock+0x2d/0x90 > > [23279.097223] [<ffffffff817a34a7>] schedule+0x37/0x90 > > [23279.103662] [<ffffffff817a389e>] schedule_preempt_disabled+0xe/0x10 > > [23279.111665] [<ffffffff817a5a05>] mutex_lock_nested+0x165/0x3d0 > > [23279.119287] [<ffffffffa01f7fac>] ? cancel_userptr+0x2c/0x150 [i915] > > [23279.127376] [<ffffffff817abf88>] ? ftrace_graph_caller+0x78/0xa8 > > [23279.135204] [<ffffffffa01f7fac>] ? cancel_userptr+0x2c/0x150 [i915] > > [23279.143329] [<ffffffffa01f7fac>] cancel_userptr+0x2c/0x150 [i915] > > [23279.151263] [<ffffffff817abfb8>] ftrace_graph_caller+0xa8/0xa8 > > [23279.158801] [<ffffffffa01f8211>] i915_gem_userptr_mn_invalidate_range_start+0x141/0x200 [i915] > > [23279.169559] [<ffffffffa01f80d5>] ? i915_gem_userptr_mn_invalidate_range_start+0x5/0x200 [i915] > > [23279.180364] [<ffffffff817abfb8>] ftrace_graph_caller+0xa8/0xa8 > > [23279.187919] [<ffffffff8120fbb3>] __mmu_notifier_invalidate_range_start+0x83/0xd0 > > [23279.197342] [<ffffffff8120fb35>] ? __mmu_notifier_invalidate_range_start+0x5/0xd0 > > [23279.206804] [<ffffffff811e9036>] zap_page_range_single+0xf6/0x120 > > [23279.214727] [<ffffffff811e9114>] ? unmap_mapping_range+0x64/0x140 > > [23279.222515] [<ffffffff811e9114>] ? unmap_mapping_range+0x64/0x140 > > [23279.230489] [<ffffffff811e9114>] ? unmap_mapping_range+0x64/0x140 > > [23279.238442] [<ffffffff811e91ca>] unmap_mapping_range+0x11a/0x140 > > [23279.246141] [<ffffffffa01ecba5>] ? i915_gem_release_mmap+0x5/0x20 [i915] > > [23279.254753] [<ffffffffa01eb866>] i915_gem_release_mmap.part.44+0x46/0x60 [i915] > > [23279.264002] [<ffffffffa01ecbb9>] ? i915_gem_release_mmap+0x19/0x20 [i915] > > [23279.272723] [<ffffffff817abfb8>] ftrace_graph_caller+0xa8/0xa8 > > [23279.280239] [<ffffffffa01ecbb9>] i915_gem_release_mmap+0x19/0x20 [i915] > > [23279.288780] [<ffffffff817abfb8>] ftrace_graph_caller+0xa8/0xa8 > > [23279.296323] [<ffffffffa01f6df4>] i915_gem_set_tiling+0x204/0x540 [i915] > > [23279.304854] [<ffffffff817abfb8>] ftrace_graph_caller+0xa8/0xa8 > > [23279.312559] [<ffffffffa0119a8f>] drm_ioctl+0x12f/0x620 [drm] > > [23279.319986] [<ffffffffa01f6bf0>] ? i915_gem_detect_bit_6_swizzle+0x200/0x200 [i915] > > [23279.329744] [<ffffffff811eac5e>] ? handle_mm_fault+0xe1e/0x1780 > > [23279.337380] [<ffffffff8101efcd>] ? native_sched_clock+0x2d/0x90 > > [23279.345081] [<ffffffff8101f039>] ? sched_clock+0x9/0x10 > > [23279.352029] [<ffffffff810d3fd5>] ? local_clock+0x25/0x30 > > [23279.359059] [<ffffffff810ef31f>] ? lock_release_holdtime.part.25+0xf/0x1f0 > > [23279.367914] [<ffffffff8124fc96>] do_vfs_ioctl+0x2c6/0x4f0 > > [23279.375068] [<ffffffff810edf13>] ? up_read+0x23/0x40 > > [23279.381794] [<ffffffff81068cfc>] ? __do_page_fault+0x1bc/0x450 > > [23279.389356] [<ffffffff8124ff41>] SyS_ioctl+0x81/0xa0 > > [23279.396025] [<ffffffff817a992e>] system_call_fastpath+0x12/0x76 > > [23279.403823] 4 locks held by test_api64/1359: > > [23279.409505] #0: (&dev->struct_mutex){......}, at: [<ffffffffa01f6ce6>] i915_gem_set_tiling+0xf6/0x540 [i915] > > [23279.421939] #1: (&mapping->i_mmap_rwsem){......}, at: [<ffffffff811e9114>] unmap_mapping_range+0x64/0x140 > > [23279.434071] #2: (&srcu){......}, at: [<ffffffff8120fb35>] __mmu_notifier_invalidate_range_start+0x5/0xd0 > > [23279.446103] #3: (&dev->struct_mutex){......}, at: [<ffffffffa01f7fac>] cancel_userptr+0x2c/0x150 [i915] > > > > Since then, i915 has changed a lot... (no more struct mutex), so it's hard to > > immediately say whether there's still any deadlock potential there. > > Looking at CI results, still something similar, with > userptr_mn_invalidate_range_start() involved, can be observed, but only on That's the equivalent cycle to the original report. We avoid it on full-ppgtt by separating the lock classes for GGTT and ppGTT; and as the cycle is on invalidate_range -> object_unbind -> revoke_mmap -> invalidate_range it only applies if we take a pagefault on a userptr from a mmap. I haven't seen a way we can avoid it for snb and its ilk, yet. Now since with mmap_offset we do revoke_mmap on not just GGTT vma, but as we remove all the pages, we should be seeing the same lockcycles for userptr + mmap_offset on all as it should cycle through obj->mm.lock. (Which is why we took the preventative step of disallowing mixing mmap_offset and userptr, and userptr cannot find a page for a mmap_offset mmap so should be fine -- though there's a risk we may have contaminated the locks before rejection.) -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx