Quoting Mika Kuoppala (2018-03-14 14:16:08) > Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > > > Not only is the context suspect to disappearing, but so is it's > > timeline. Under a lockless inspection of the requests for > > debugging from intel_engine_dump(), the context may already have been > > freed and we have to check before chasing the dangling pointer. > > > > [28033.681755] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel crct10dif_pclmul crc32_pclmul snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel snd_pcm mei_me mei i915 r8169 mii prime_numbers i2c_hid > > [28033.681796] CPU: 3 PID: 3058 Comm: gem_exec_schedu Tainted: G U 4.16.0-rc5+ #9 > > [28033.681804] Hardware name: Acer Aspire E5-575G/Ironman_SK , BIOS V1.12 08/02/2016 > > [28033.681834] RIP: 0010:print_request+0x2b/0xb0 [i915] > > [28033.681840] RSP: 0018:ffffc90004afbc18 EFLAGS: 00010202 > > [28033.681847] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8801921b5a40 RCX: 0000000000000006 > > [28033.681854] RDX: ffffc90004afbc60 RSI: ffff8801921b5a40 RDI: 0000000000000004 > > [28033.681861] RBP: ffffc90004afbd80 R08: 0000000000000000 R09: 0000000000000001 > > [28033.681868] R10: ffffc90004afbbd0 R11: ffffc90004afbc73 R12: ffffc90004afbc60 > > [28033.681875] R13: ffffc90004afbd80 R14: ffff8801d40ec670 R15: ffff8801921b5a40 > > [28033.681883] FS: 00007fbba5f6c8c0(0000) GS:ffff8801e8400000(0000) knlGS:0000000000000000 > > [28033.681891] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [28033.681897] CR2: 00007fbba5f8f000 CR3: 00000001b2efa002 CR4: 00000000003606e0 > > [28033.681904] Call Trace: > > [28033.681932] intel_engine_print_registers+0x6a7/0x930 [i915] > > [28033.681962] intel_engine_dump+0x30d/0x740 [i915] > > [28033.681971] ? seq_printf+0x3a/0x50 > > [28033.681995] i915_engine_info+0xb8/0xe0 [i915] > > [28033.682003] ? drm_get_color_range_name+0x20/0x20 > > [28033.682010] seq_read+0xe1/0x440 > > [28033.682018] full_proxy_read+0x51/0x80 > > [28033.682025] __vfs_read+0x21/0x130 > > [28033.682031] ? do_sys_open+0x134/0x220 > > [28033.682037] ? kmem_cache_free+0x177/0x2b0 > > [28033.682043] vfs_read+0xa1/0x150 > > [28033.682049] SyS_read+0x40/0xa0 > > [28033.682055] do_syscall_64+0x6b/0x1b0 > > [28033.682063] entry_SYSCALL_64_after_hwframe+0x42/0xb7 > > [28033.682069] RIP: 0033:0x7fbba4655d11 > > [28033.682074] RSP: 002b:00007ffd8c49da58 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > > [28033.682082] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbba4655d11 > > [28033.682089] RDX: 000000000000003f RSI: 00005647bfbfc260 RDI: 0000000000000006 > > [28033.682096] RBP: 000000000000003f R08: 00000000ffffffff R09: 0000000000000000 > > [28033.682104] R10: 0000000000000000 R11: 0000000000000246 R12: 00005647bfbfc260 > > [28033.682111] R13: 0000000000000006 R14: 0000000000000000 R15: 00005647bfbfc260 > > [28033.682119] Code: 41 55 41 54 49 89 d4 55 53 48 89 fd 48 8b 86 c8 00 00 00 48 8b 3d d6 1e 14 e2 48 89 f3 48 2b be a8 02 00 00 48 8b 80 b0 00 00 00 <4c> 8b 68 18 e8 bc 80 02 e1 8b 8b 70 02 00 00 8b b3 28 02 00 00 > > [28033.682206] RIP: print_request+0x2b/0xb0 [i915] RSP: ffffc90004afbc18 > > > > Reported-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > > --- > > drivers/gpu/drm/i915/intel_engine_cs.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c > > index a2b1e9e2c008..f22c5f72df8d 100644 > > --- a/drivers/gpu/drm/i915/intel_engine_cs.c > > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c > > @@ -1716,13 +1716,15 @@ static void print_request(struct drm_printer *m, > > struct i915_request *rq, > > const char *prefix) > > { > > + const char *name = rq->fence.ops->get_timeline_name(&rq->fence); > > + > > So we get "signaled" And as we hold the rcu lock here, context can't also vanish between the test inside get_timeline_name() and our use in the printf. (Probably could do with another sentence to in get_timeline_name() to illustrate the rcu_read_lock() requirement in the caller.) > Reviewed-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> Thanks, hopefully you will be able to run gem_ctx_switch all night now! -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx