On 08/01/15 13:40, Mika Kuoppala wrote: > i915_gem_validate_context() will check the engine->state to see if it can > submit into a ringbuffer. But when we are releasing the context we leave the > engine state to a non null value. Thus after a successful hang recovery > we might mistakenly submit to a non initialized ringbuffer resulting in: > > [ 1991.356418] ------------[ cut here ]------------ > [ 1991.359192] WARNING: CPU: 1 PID: 2335 at lib/iomap.c:43 bad_io_access+0x3d/0x40() > [ 1991.361966] Bad IO access at port 0x24 (outl(val,port)) > [ 1991.364750] Modules linked in: snd_hda_codec_hdmi i915 x86_pkg_temp_thermal coretemp kvm_intel kvm snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hwdep snd_pcm aesni_intel aes_x86_64 glue_helper lrw i2c_algo_bit gf128mul ablk_helper drm_kms_helper cryptd snd_seq_midi snd_seq_midi_event serio_raw drm snd_rawmidi snd_seq snd_seq_device snd_timer video snd soundcore mei_me lpc_ich bnep mac_hid acpi_pad mei rfcomm bluetooth parport_pc ppdev lp parport nls_iso8859_1 e1000e ptp ahci libahci pps_core sdhci_acpi sdhci > [ 1991.370827] CPU: 1 PID: 2335 Comm: gem_ringfill Tainted: G W 3.19.0-rc3+ #50 > [ 1991.373838] Hardware name: Intel Corporation Broadwell Client platform/SawTooth Peak, BIOS BDW-E1R1.86C.0092.R00.1408311942 08/31/2014 > [ 1991.376902] ffffffff81aa1a46 ffff88014910fac8 ffffffff8173dbcf 0000000000000001 > [ 1991.379978] ffff88014910fb18 ffff88014910fb08 ffffffff8107007a ffff88014910fb28 > [ 1991.383037] ffff880147209940 ffff8800aafa8718 ffff8800aafa0000 ffff8800aafa1918 > [ 1991.386094] Call Trace: > [ 1991.389140] [<ffffffff8173dbcf>] dump_stack+0x45/0x57 > [ 1991.392207] [<ffffffff8107007a>] warn_slowpath_common+0x8a/0xc0 > [ 1991.395268] [<ffffffff810700f6>] warn_slowpath_fmt+0x46/0x50 > [ 1991.398330] [<ffffffffa053290c>] ? intel_logical_ring_begin+0x3c/0x240 [i915] > [ 1991.401395] [<ffffffff813985bd>] bad_io_access+0x3d/0x40 > [ 1991.404462] [<ffffffff81398763>] iowrite32+0x33/0x40 > [ 1991.407529] [<ffffffffa0533585>] gen8_init_rcs_context+0xd5/0x170 [i915] > [ 1991.410605] [<ffffffffa0533d17>] intel_lr_context_deferred_create+0x657/0x8e0 [i915] > [ 1991.413668] [<ffffffffa050eff1>] i915_gem_do_execbuffer.isra.22+0xed1/0xf60 [i915] > [ 1991.416736] [<ffffffff811c0125>] ? __kmalloc+0x55/0x1b0 > [ 1991.419801] [<ffffffffa051029c>] ? i915_gem_execbuffer2+0x6c/0x2c0 [i915] > [ 1991.422772] [<ffffffffa05102e1>] i915_gem_execbuffer2+0xb1/0x2c0 [i915] > [ 1991.425632] [<ffffffffa01b8ab4>] drm_ioctl+0x1a4/0x630 [drm] > [ 1991.428454] [<ffffffff811258bc>] ? acct_account_cputime+0x1c/0x20 > [ 1991.431255] [<ffffffff811ee378>] do_vfs_ioctl+0x2f8/0x510 > [ 1991.434009] [<ffffffff8109f834>] ? vtime_account_user+0x54/0x60 > [ 1991.436778] [<ffffffff811ee611>] SyS_ioctl+0x81/0xa0 > [ 1991.439553] [<ffffffff81745cb4>] ? int_check_syscall_exit_work+0x34/0x3d > [ 1991.442306] [<ffffffff81745a2d>] system_call_fastpath+0x16/0x1b > > Fix this by setting all the engine fields properly when lrc is freed. > > Cc: Thomas Daniel <thomas.daniel@xxxxxxxxx> > Signed-off-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > --- > drivers/gpu/drm/i915/intel_lrc.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index 7670a0f..32684d9 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -1777,6 +1777,10 @@ void intel_lr_context_free(struct intel_context *ctx) > intel_destroy_ringbuffer_obj(ringbuf); > kfree(ringbuf); > drm_gem_object_unreference(&ctx_obj->base); > + WARN_ON(ctx->engine[i].unpin_count != 0); > + ctx->engine[i].unpin_count = 0; > + ctx->engine[i].ringbuf = NULL; > + ctx->engine[i].state = NULL; > } > } > } Hi, I don't quite see how this can fix the problem illustrated by the stack trace above. AFAICS intel_lr_context_free() is called /only/ from i915_gem_context_free(), which should mean that the refcount on the intel_context object is already zero, and that it will be freed on return. So the contents of ctx->engine[] should be irrelevant ... void i915_gem_context_free(struct kref *ctx_ref) { struct intel_context *ctx = container_of(ctx_ref, typeof(*ctx), ref); trace_i915_context_free(ctx); if (i915.enable_execlists) intel_lr_context_free(ctx); i915_ppgtt_put(ctx->ppgtt); if (ctx->legacy_hw_ctx.rcs_state) drm_gem_object_unreference(&ctx->legacy_hw_ctx.rcs_state->base); list_del(&ctx->link); kfree(ctx); } ... unless something is trying to reuse the context while it is still in the process of being deleted :( In addition, the stack trace above implies that ctx->engine[].state WAS NULL when i915_gem_validate_context() was called, otherwise it would not have called intel_lr_context_deferred_create() if (i915.enable_execlists && !ctx->engine[ring->id].state) { int ret = intel_lr_context_deferred_create(ctx, ring); if (ret) { DRM_DEBUG("Could not create LRC %u: %d\n", ctx_id, ret); return ERR_PTR(ret); } } and likewise that function would not have called gen8_init_rcs_context() unless this was a new context: if (ctx == ring->default_context) lrc_setup_hardware_status_page(ring, ctx_obj); else if (ring->id == RCS && !ctx->rcs_initialized) { if (ring->init_context) { ret = ring->init_context(ring, ctx); if (ret) { DRM_ERROR("ring init context: %d\n", ret); ctx->engine[ring->id].ringbuf = NULL; ctx->engine[ring->id].state = NULL; goto error; } } ctx->rcs_initialized = true; } Note that rcs_initialized is never cleared, even with your change, so that in a use-after-free situation we wouldn't end up in this path. So I think the mystery is how this context ended up in an inconsistent state: has it been partially freed and then reused, or has some part of the new context allocation path failed but not been unwound correctly? And if setting to NULL a pointer that's inside a structure that's in the process of being freed actually makes a difference, doesn't that mean there's a use-after-free issue somewhere? .Dave. _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx