Re: ✗ Fi.CI.BAT: warning for series starting with [1/2] drm/i915: introduce & use i915_gem_object_mark_dirty()

Daniel Vetter <daniel@xxxxxxxx> · Mon, 2 May 2016 10:58:57 +0200



On Thu, Apr 28, 2016 at 07:36:32PM +0100, Dave Gordon wrote:
> On 28/04/16 18:48, Patchwork wrote:
> >== Series Details ==
> >
> >Series: series starting with [1/2] drm/i915: introduce & use i915_gem_object_mark_dirty()
> >URL   : https://patchwork.freedesktop.org/series/6491/
> >State : warning
> >
> >== Summary ==
> >
> >Series 6491v1 Series without cover letter
> >http://patchwork.freedesktop.org/api/1.0/series/6491/revisions/1/mbox/
> >
> >Test gem_busy:
> >         Subgroup basic-blt:
> >                 pass       -> DMESG-WARN (bsw-nuc-2)
> >                 pass       -> DMESG-WARN (skl-nuci5)
> >                 pass       -> DMESG-WARN (bdw-nuci7-2)
> >                 pass       -> DMESG-WARN (ivb-t430s)
> >                 pass       -> DMESG-WARN (bdw-ultra)
> >                 pass       -> DMESG-WARN (skl-i7k-2)
> >                 pass       -> DMESG-WARN (byt-nuc)
> >                 pass       -> DMESG-WARN (snb-x220t)
> >                 pass       -> DMESG-WARN (hsw-brixbox)
> 
> Well, that's as expected: it's hitting the WARN_ON() that I put in there to
> check on usage of obj->dirty vs. pages_pin_count. Stack traces are all the
> same, like this one:
> 
> [   72.459223] ------------[ cut here ]------------
> [   72.459254] WARNING: CPU: 0 PID: 6012 at
> drivers/gpu/drm/i915/i915_drv.h:3027
> i915_gem_object_set_to_gtt_domain+0x21c/0x280 [i915]
> [   72.459255] WARN_ON(obj->pages_pin_count == 0)
> [   72.459256] Modules linked in:
> [   72.459257]  snd_hda_codec_hdmi snd_hda_intel i915 x86_pkg_temp_thermal
> snd_hda_codec snd_hwdep snd_hda_core intel_powerclamp coretemp
> crct10dif_pclmul crc32_pclmul mei_me lpc_ich ghash_clmulni_intel mei snd_pcm
> r8169 mii
> [   72.459266] CPU: 0 PID: 6012 Comm: gem_busy Tainted: G        W
> 4.6.0-rc5-CI-Patchwork_2105+ #1
> [   72.459267] Hardware name: Gigabyte Technology Co., Ltd.
> H87M-D3H/H87M-D3H, BIOS F11 08/18/2015
> [   72.459268]  0000000000000000 ffff880212053c80 ffffffff8140de35
> ffff880212053cd0
> [   72.459270]  0000000000000000 ffff880212053cc0 ffffffff81079c8c
> 00000bd312e5a980
> [   72.459272]  ffff880212e5a980 0000000000000001 ffff8800d7c70000
> 0000000000000000
> [   72.459274] Call Trace:
> [   72.459277]  [<ffffffff8140de35>] dump_stack+0x67/0x92
> [   72.459280]  [<ffffffff81079c8c>] __warn+0xcc/0xf0
> [   72.459281]  [<ffffffff81079cfa>] warn_slowpath_fmt+0x4a/0x50
> [   72.459293]  [<ffffffffa01efacb>] ?
> i915_gem_object_flush_cpu_write_domain.part.47+0x14b/0x1b0 [i915]
> [   72.459303]  [<ffffffffa01f113c>]
> i915_gem_object_set_to_gtt_domain+0x21c/0x280 [i915]
> [   72.459313]  [<ffffffffa01f128e>] i915_gem_set_domain_ioctl+0xee/0x160
> [i915]
> [   72.459315]  [<ffffffff815282ed>] drm_ioctl+0x13d/0x590
> [   72.459325]  [<ffffffffa01f11a0>] ?
> i915_gem_object_set_to_gtt_domain+0x280/0x280 [i915]
> [   72.459327]  [<ffffffff81199ba7>] ? handle_mm_fault+0x47/0x1e90
> [   72.459329]  [<ffffffff811ee38a>] do_vfs_ioctl+0x8a/0x670
> [   72.459331]  [<ffffffff811fa21a>] ? __fget_light+0x6a/0x90
> [   72.459332]  [<ffffffff811ee9ac>] SyS_ioctl+0x3c/0x70
> [   72.459333]  [<ffffffff817dc7a9>] entry_SYSCALL_64_fastpath+0x1c/0xac
> [   72.459334] ---[ end trace 156adc997a22f992 ]---
> 
> So, is that a bug, marking an object dirty when pages_pin_count is 0? Does
> that mean that a program can set a BO to the GTT domain (or the CPU
> domain?), update its contents, and then it gets paged out due to memory
> pressure and the updates are lost?
> 
> Or ... no, I think the problem scenario would be
> * set to GTT => mark dirty
> * BO paged out => flushed to swap, marked clean
> * BO paged in => still clean
> * update contents => still clean?
> * get paged out => not written out?
> 
> Or are we guaranteed to hit another mark_dirty during the process of
> updating the contents of the paged-in buffer?

I didn't think through the details, but these kind of scenarios are
general the really "fun" gem bugs that bite hard 2 years later. Typing an
igt to repro your scenario and just check whether there's a problem would
be great. Worst case it's a good exercise in what kind of tricks are
needed to pull the kernel over the table.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx