Re: [PATCH] drm/i915: More vma fixups around unbind/destroy

Daniel Vetter <daniel@xxxxxxxx> · Mon, 26 Aug 2013 21:04:50 +0200

On Mon, Aug 26, 2013 at 10:36:35AM +0100, Chris Wilson wrote:
> On Mon, Aug 26, 2013 at 11:23:47AM +0200, Daniel Vetter wrote:
> > The important bugfix here is that we must not unlink the vma when
> > we keep it around as a placeholder for the execbuf code. Since then we
> > won't find it again when execbuf gets interrupt and restarted and
> > create a 2nd vma. And since the code as-is isn't fit yet to deal with
> > more than one vma, hilarity ensues.
> > 
> > Specifically the dma map/unmap of the sg table isn't adjusted for
> > multiple vmas yet and will blow up like this:
> > 
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> > IP: [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
> > PGD 56bb5067 PUD ad3dd067 PMD 0
> > Oops: 0000 [#1] SMP
> > Modules linked in: tcp_lp ppdev parport_pc lp parport ipv6 dm_mod dcdbas snd_hda_codec_hdmi pcspkr snd_hda_codec_realtek serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_codec lpc_ich snd_hwdep mfd_core snd_pcm snd_page_alloc snd_timer snd soundcore acpi_cpufreq i915 video button drm_kms_helper drm mperf freq_table
> > CPU: 1 PID: 16650 Comm: fbo-maxsize Not tainted 3.11.0-rc4_nightlytop_d93f59_debug_20130814_+ #6957
> > Hardware name: Dell Inc. OptiPlex 9010/03JR84, BIOS A01 05/04/2012
> > task: ffff8800563b3f00 ti: ffff88004bdf4000 task.ti: ffff88004bdf4000
> > RIP: 0010:[<ffffffffa008fb37>]  [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
> > RSP: 0018:ffff88004bdf5958  EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: ffff8801135e0000 RCX: ffff8800ad3bf8e0
> > RDX: ffff8800ad3bf8e0 RSI: 0000000000000000 RDI: ffff8801007ee780
> > RBP: ffff88004bdf5978 R08: ffff8800ad3bf8e0 R09: 0000000000000000
> > R10: ffffffff86ca1810 R11: ffff880036a17101 R12: ffff8801007ee780
> > R13: 0000000000018001 R14: ffff880118c4e000 R15: ffff8801007ee780
> > FS:  00007f401a0ce740(0000) GS:ffff88011e280000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000008 CR3: 000000005635c000 CR4: 00000000001407e0
> > Stack:
> >  ffff8801007ee780 ffff88005c253180 0000000000018000 ffff8801135e0000
> >  ffff88004bdf59a8 ffffffffa0088e55 0000000000000011 ffff8801007eec00
> >  0000000000018000 ffff880036a17101 ffff88004bdf5a08 ffffffffa0089026
> > Call Trace:
> >  [<ffffffffa0088e55>] i915_vma_unbind+0xdf/0x1ab [i915]
> >  [<ffffffffa0089026>] __i915_gem_shrink+0x105/0x177 [i915]
> >  [<ffffffffa0089452>] i915_gem_object_get_pages_gtt+0x108/0x309 [i915]
> >  [<ffffffffa0085ba9>] i915_gem_object_get_pages+0x61/0x90 [i915]
> >  [<ffffffffa008f22b>] ? gen6_ppgtt_insert_entries+0x103/0x125 [i915]
> >  [<ffffffffa008a113>] i915_gem_object_pin+0x1fa/0x5df [i915]
> >  [<ffffffffa008cdfe>] i915_gem_execbuffer_reserve_object.isra.6+0x8d/0x1bc [i915]
> >  [<ffffffffa008d156>] i915_gem_execbuffer_reserve+0x229/0x367 [i915]
> >  [<ffffffffa008dbf6>] i915_gem_do_execbuffer.isra.12+0x4dc/0xf3a [i915]
> >  [<ffffffff810fc823>] ? might_fault+0x40/0x90
> >  [<ffffffffa008eb89>] i915_gem_execbuffer2+0x187/0x222 [i915]
> >  [<ffffffffa000971c>] drm_ioctl+0x308/0x442 [drm]
> >  [<ffffffffa008ea02>] ? i915_gem_execbuffer+0x3ae/0x3ae [i915]
> >  [<ffffffff817db156>] ? __do_page_fault+0x3dd/0x481
> >  [<ffffffff8112fdba>] vfs_ioctl+0x26/0x39
> >  [<ffffffff811306a2>] do_vfs_ioctl+0x40e/0x451
> >  [<ffffffff817deda7>] ? sysret_check+0x1b/0x56
> >  [<ffffffff8113073c>] SyS_ioctl+0x57/0x87
> >  [<ffffffff8135bbfe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> >  [<ffffffff817ded82>] system_call_fastpath+0x16/0x1b
> > Code: 48 c7 c6 84 30 0e a0 31 c0 e8 d0 e9 f7 ff bf c6 a7 00 00 e8 07 af 2c e1 41 f6 84 24 03 01 00 00 10 75 44 49 8b 84 24 08 01 00 00 <8b> 50 08 48 8b 30 49 8b 86 b0 04 00 00 48 89 c7 48 81 c7 98 00
> > RIP  [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
> >  RSP <ffff88004bdf5958>
> > CR2: 0000000000000008
> > 
> > As a consequence we need to change the "only one vma for now" check in
> > vma_unbind - since vma_destroy isn't always called the obj->vma_list
> > might not be empty. Instead check that the vma list is singular at the
> > beginning of vma_unbind. This is also more symmetric with bind_to_vm.
> > 
> > v2:
> > - Add a paranoid WARN to mark_free in the eviction code to make sure
> >   we never try to evict a vma used by the execbuf code right now.
> > - Move the check for a temporary execbuf vma into vma_destroy -
> >   otherwise the failure path cleanup in bind_to_vm will blow up.
> > 
> > Our first attempting at fixing this was
> > 
> > commit 1be81a2f2cfd8789a627401d470423358fba2d76
> > Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > Date:   Tue Aug 20 12:56:40 2013 +0100
> > 
> >     drm/i915: Don't destroy the vma placeholder during execbuffer reservation
> > 
> > Squash with this when merging!
> > 
> > v3: Improvements suggested in Chris' review:
> > - Move the WARN_ON in vma_destroy that checks for vmas with an drm_mm
> >   allocation before the early return.
> > - Bail out if we hit the WARN in mark_free to hopefully make the
> >   kernel survive for long enough to capture it.
> > 
> > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > Cc: Ben Widawsky <ben@xxxxxxxxxxxx>
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68298
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68171
> > Tested-by: lu hua <huax.lu@xxxxxxxxx> (v2)
> > Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxx>
> 
> You have been favoured by my mailer, you win a
> Reviewed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>

Queued for -next, thanks for the review.

> Would be nice if we could just roll the fixes together into one...

They're in one tight group, so bisect fail is actually minimized. And
given that it took a while for us to understand this mess I think it's
prudent to keep the fixups separate in case they've introduced issues of
their own.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx