Re: [PATCH] drm/i915: Keep ring->active_list and ring->requests_list consistent

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 20, 2015 at 01:39:51PM +0000, Chris Wilson wrote:
> On Fri, Mar 20, 2015 at 01:02:10PM +0000, Chris Wilson wrote:
> > On Fri, Mar 20, 2015 at 11:06:57AM +0100, Daniel Vetter wrote:
> > > On Thu, Mar 19, 2015 at 10:17:42PM +0000, Chris Wilson wrote:
> > > > On Thu, Mar 19, 2015 at 06:37:28PM +0100, Daniel Vetter wrote:
> > > > > On Wed, Mar 18, 2015 at 06:19:22PM +0000, Chris Wilson wrote:
> > > > > > 	WARNING: CPU: 0 PID: 1383 at drivers/gpu/drm/i915/i915_gem_evict.c:279 i915_gem_evict_vm+0x10c/0x140()
> > > > > > 	WARN_ON(!list_empty(&vm->active_list))
> > > > > 
> > > > > How does this come about - we call gpu_idle before this seems to blow up,
> > > > > so all requests should be completed?
> > > > 
> > > > Honestly, I couldn't figure it out either. I had an epiphany when I saw
> > > > that we could now have an empty request list but non-empty active list
> > > > added a test to detect when that happens and shouted eureka when the
> > > > WARN fired. I could trigger the WARN in evict_vm pretty reliably, but
> > > > not since this patch. It could just be masking another bug.
> > > 
> > > Can you perhaps double-check the theory by putting a
> > > WARN_ON(list_empty(active_list) != list_empyt(request_list)) into
> > > gpu_idle? Ofc with this patch reverted so that the bug surfaces again.
> > 
> > [ 5215.567573] [drm:i915_verify_lists] *ERROR* render ring: active list not empty, but no requests
> > [ 5215.567586] ------------[ cut here ]------------
> > [ 5215.567598] WARNING: CPU: 0 PID: 1304 at drivers/gpu/drm/i915/i915_gem.c:3166 i915_gpu_idle+0x88/0x90()
> > [ 5215.567602] WARN_ON(i915_verify_lists(dev))
> > [ 5215.567606] Modules linked in: ctr ccm arc4 ath9k ath9k_common ath9k_hw bnep ath mac80211 rfcomm snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec uvcvideo snd_hwdep snd_pcm gpio_ich videobuf2_vmalloc dell_wmi cfg80211 videobuf2_memops sparse_keymap videobuf2_core dell_laptop snd_seq_midi v4l2_common dcdbas snd_seq_midi_event btusb videodev i8k snd_rawmidi snd_seq hid_multitouch coretemp bluetooth microcode snd_seq_device joydev snd_timer serio_raw snd shpchp soundcore wmi lpc_ich usbhid hid psmouse ahci libahci
> > [ 5215.567708] CPU: 0 PID: 1304 Comm: Xorg Tainted: G        W  OE   4.0.0-rc4+ #108
> > [ 5215.567713] Hardware name: Dell Inc. Inspiron 1090/Inspiron 1090, BIOS A06 08/23/2011
> > [ 5215.567718]  00000000 00000000 f46e1b98 c16b3e19 f46e1bd8 f46e1bc8 c1047f17 c1937e78
> > [ 5215.567733]  f46e1bf4 00000518 c1937cec 00000c5e c14441e8 c14441e8 e733bdc8 00000000
> > [ 5215.567747]  f6346c00 f46e1be0 c1047f83 00000009 f46e1bd8 c1937e78 f46e1bf4 f46e1c00
> > [ 5215.567762] Call Trace:
> > [ 5215.567776]  [<c16b3e19>] dump_stack+0x41/0x52
> > [ 5215.567788]  [<c1047f17>] warn_slowpath_common+0x87/0xc0
> > [ 5215.567797]  [<c14441e8>] ? i915_gpu_idle+0x88/0x90
> > [ 5215.567805]  [<c14441e8>] ? i915_gpu_idle+0x88/0x90
> > [ 5215.567815]  [<c1047f83>] warn_slowpath_fmt+0x33/0x40
> > [ 5215.567823]  [<c14441e8>] i915_gpu_idle+0x88/0x90
> > [ 5215.567833]  [<c1439949>] i915_gem_evict_something+0x269/0x300
> > [ 5215.567843]  [<c144754f>] i915_gem_object_do_pin+0x6ef/0xb20
> > [ 5215.567854]  [<c14479c5>] i915_gem_object_pin+0x45/0x50
> > [ 5215.567864]  [<c1439f08>] i915_gem_execbuffer_reserve_vma.isra.13+0x78/0x180
> > [ 5215.567874]  [<c143a2e5>] i915_gem_execbuffer_reserve+0x2d5/0x320
> > [ 5215.567884]  [<c11594cd>] ? __kmalloc+0x14d/0x190
> > [ 5215.567894]  [<c143b6d9>] i915_gem_do_execbuffer.isra.17+0x5c9/0xdd0
> > [ 5215.567906]  [<c112efdb>] ? vm_mmap_pgoff+0x7b/0xa0
> > [ 5215.567915]  [<c11594cd>] ? __kmalloc+0x14d/0x190
> > [ 5215.567925]  [<c143cfeb>] i915_gem_execbuffer2+0x8b/0x2c0
> > [ 5215.567934]  [<c143cf60>] ? i915_gem_execbuffer+0x4e0/0x4e0
> > [ 5215.567944]  [<c1401d67>] drm_ioctl+0x1b7/0x510
> > [ 5215.567954]  [<c1120a9a>] ? balance_dirty_pages_ratelimited+0x1a/0x6a0
> > [ 5215.567963]  [<c143cf60>] ? i915_gem_execbuffer+0x4e0/0x4e0
> > [ 5215.567975]  [<c113cef9>] ? handle_mm_fault+0x329/0x1250
> > [ 5215.567984]  [<c1401bb0>] ? drm_getmap+0xb0/0xb0
> > [ 5215.567994]  [<c117d9ca>] do_vfs_ioctl+0x30a/0x530
> > [ 5215.568005]  [<c10a9e92>] ? ktime_get_ts64+0x52/0x1a0
> > [ 5215.568095]  [<c1185f62>] ? __fget_light+0x22/0x60
> > [ 5215.568136]  [<c117dc50>] SyS_ioctl+0x60/0x90
> > [ 5215.568175]  [<c16b9bc8>] sysenter_do_call+0x12/0x12
> > [ 5215.568198] ---[ end trace ab3f7e4953cb9eb6 ]---
> > [ 5215.568272] ------------[ cut here ]------------
> > [ 5215.568288] WARNING: CPU: 0 PID: 1304 at drivers/gpu/drm/i915/i915_gem_evict.c:283 i915_gem_evict_vm+0x10c/0x140()
> > [ 5215.568292] WARN_ON(!list_empty(&vm->active_list))
> > [ 5215.568296] Modules linked in: ctr ccm arc4 ath9k ath9k_common ath9k_hw bnep ath mac80211 rfcomm snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec uvcvideo snd_hwdep snd_pcm gpio_ich videobuf2_vmalloc dell_wmi cfg80211 videobuf2_memops sparse_keymap videobuf2_core dell_laptop snd_seq_midi v4l2_common dcdbas snd_seq_midi_event btusb videodev i8k snd_rawmidi snd_seq hid_multitouch coretemp bluetooth microcode snd_seq_device joydev snd_timer serio_raw snd shpchp soundcore wmi lpc_ich usbhid hid psmouse ahci libahci
> > [ 5215.568383] CPU: 0 PID: 1304 Comm: Xorg Tainted: G        W  OE   4.0.0-rc4+ #108
> > [ 5215.568388] Hardware name: Dell Inc. Inspiron 1090/Inspiron 1090, BIOS A06 08/23/2011
> > [ 5215.568393]  00000000 00000000 f46e1cc0 c16b3e19 f46e1d00 f46e1cf0 c1047f17 c193712c
> > [ 5215.568407]  f46e1d1c 00000518 c19370d0 0000011b c1439c6c c1439c6c f3b225b0 e733c3ec
> > [ 5215.568421]  00000001 f46e1d08 c1047f83 00000009 f46e1d00 c193712c f46e1d1c f46e1d28
> > [ 5215.568435] Call Trace:
> > [ 5215.568445]  [<c16b3e19>] dump_stack+0x41/0x52
> > [ 5215.568455]  [<c1047f17>] warn_slowpath_common+0x87/0xc0
> > [ 5215.568465]  [<c1439c6c>] ? i915_gem_evict_vm+0x10c/0x140
> > [ 5215.568474]  [<c1439c6c>] ? i915_gem_evict_vm+0x10c/0x140
> > [ 5215.568483]  [<c1047f83>] warn_slowpath_fmt+0x33/0x40
> > [ 5215.568492]  [<c1439c6c>] i915_gem_evict_vm+0x10c/0x140
> > [ 5215.568502]  [<c143a236>] i915_gem_execbuffer_reserve+0x226/0x320
> > [ 5215.568511]  [<c11594cd>] ? __kmalloc+0x14d/0x190
> > [ 5215.568521]  [<c143b6d9>] i915_gem_do_execbuffer.isra.17+0x5c9/0xdd0
> > [ 5215.568532]  [<c112efdb>] ? vm_mmap_pgoff+0x7b/0xa0
> > [ 5215.568541]  [<c11594cd>] ? __kmalloc+0x14d/0x190
> > [ 5215.568550]  [<c143cfeb>] i915_gem_execbuffer2+0x8b/0x2c0
> > [ 5215.568560]  [<c143cf60>] ? i915_gem_execbuffer+0x4e0/0x4e0
> > [ 5215.568568]  [<c1401d67>] drm_ioctl+0x1b7/0x510
> > [ 5215.568577]  [<c1120a9a>] ? balance_dirty_pages_ratelimited+0x1a/0x6a0
> > [ 5215.568587]  [<c143cf60>] ? i915_gem_execbuffer+0x4e0/0x4e0
> > [ 5215.568599]  [<c113cef9>] ? handle_mm_fault+0x329/0x1250
> > [ 5215.568607]  [<c1401bb0>] ? drm_getmap+0xb0/0xb0
> > [ 5215.568616]  [<c117d9ca>] do_vfs_ioctl+0x30a/0x530
> > [ 5215.568626]  [<c10a9e92>] ? ktime_get_ts64+0x52/0x1a0
> > [ 5215.568635]  [<c1185f62>] ? __fget_light+0x22/0x60
> > [ 5215.568644]  [<c117dc50>] SyS_ioctl+0x60/0x90
> > [ 5215.568653]  [<c16b9bc8>] sysenter_do_call+0x12/0x12
> > [ 5215.568659] ---[ end trace ab3f7e4953cb9eb7 ]---

Ok, at least we have clear evidence now that the lists indeed seem to get
out of sync.

> Ah, so what it boils down to is that i915_gpu_idle() is a no-op here is
> list_empty(ring->request_list)) [intel_ring_idle:2176].
> 
> Missing link discovered, I think the bug fixed by the patch is indeed
> the same one that triggered the first WARN.

But if we do that short-circuiting in ring_idle the all the requests
_should_ be completed. Which meanse retire_request_ring should move all
buffers to the inactive list, even when we do that before retiring
requests.

I'm still baffled and don't really understand what's going on here ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx





[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux