On Tue, Apr 11, 2017 at 10:18:56AM +0100, Chris Wilson wrote: > If we *know* that the engine is idle, i.e. we have not more contexts in > lift, we can skip any spurious CSB idle interrupts. These spurious > interrupts seem to arrive long after we assert that the engines are > completely idle, triggering later assertions: > > [ 178.896646] intel_engine_is_idle(bcs): interrupt not handled, irq_posted=2 > [ 178.896655] ------------[ cut here ]------------ > [ 178.896658] kernel BUG at drivers/gpu/drm/i915/intel_engine_cs.c:226! > [ 178.896661] invalid opcode: 0000 [#1] SMP > [ 178.896663] Modules linked in: i915(E) x86_pkg_temp_thermal(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) intel_gtt(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) aesni_intel(E) prime_numbers(E) evdev(E) aes_x86_64(E) drm(E) crypto_simd(E) cryptd(E) glue_helper(E) mei_me(E) mei(E) lpc_ich(E) efivars(E) mfd_core(E) battery(E) video(E) acpi_pad(E) button(E) tpm_tis(E) tpm_tis_core(E) tpm(E) autofs4(E) i2c_i801(E) fan(E) thermal(E) i2c_designware_platform(E) i2c_designware_core(E) > [ 178.896694] CPU: 1 PID: 522 Comm: gem_exec_whispe Tainted: G E 4.11.0-rc5+ #14 > [ 178.896702] task: ffff88040aba8d40 task.stack: ffffc900003f0000 > [ 178.896722] RIP: 0010:intel_engine_init_global_seqno+0x1db/0x1f0 [i915] > [ 178.896725] RSP: 0018:ffffc900003f3ab0 EFLAGS: 00010246 > [ 178.896728] RAX: 0000000000000000 RBX: ffff88040af54000 RCX: 0000000000000000 > [ 178.896731] RDX: ffff88041ec933e0 RSI: ffff88041ec8cc48 RDI: ffff88041ec8cc48 > [ 178.896734] RBP: ffffc900003f3ac8 R08: 0000000000000000 R09: 000000000000047d > [ 178.896736] R10: 0000000000000040 R11: ffff88040b344f80 R12: 0000000000000000 > [ 178.896739] R13: ffff88040bce0000 R14: ffff88040bce52d8 R15: ffff88040bce0000 > [ 178.896742] FS: 00007f2cccc2d8c0(0000) GS:ffff88041ec80000(0000) knlGS:0000000000000000 > [ 178.896746] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 178.896749] CR2: 00007f41ddd8f000 CR3: 000000040bb03000 CR4: 00000000001406e0 > [ 178.896752] Call Trace: > [ 178.896768] reset_all_global_seqno.part.33+0x4e/0xd0 [i915] > [ 178.896782] i915_gem_request_alloc+0x304/0x330 [i915] > [ 178.896795] i915_gem_do_execbuffer+0x8a1/0x17d0 [i915] > [ 178.896799] ? remove_wait_queue+0x48/0x50 > [ 178.896812] ? i915_wait_request+0x300/0x590 [i915] > [ 178.896816] ? wake_up_q+0x70/0x70 > [ 178.896819] ? refcount_dec_and_test+0x11/0x20 > [ 178.896823] ? reservation_object_add_excl_fence+0xa5/0x100 > [ 178.896835] i915_gem_execbuffer2+0xab/0x1f0 [i915] > [ 178.896844] drm_ioctl+0x1e6/0x460 [drm] > [ 178.896858] ? i915_gem_execbuffer+0x260/0x260 [i915] > [ 178.896862] ? dput+0xcf/0x250 > [ 178.896866] ? full_proxy_release+0x66/0x80 > [ 178.896869] ? mntput+0x1f/0x30 > [ 178.896872] do_vfs_ioctl+0x8f/0x5b0 > [ 178.896875] ? ____fput+0x9/0x10 > [ 178.896878] ? task_work_run+0x80/0xa0 > [ 178.896881] SyS_ioctl+0x3c/0x70 > [ 178.896885] entry_SYSCALL_64_fastpath+0x17/0x98 > [ 178.896888] RIP: 0033:0x7f2ccb455ca7 > [ 178.896890] RSP: 002b:00007ffcabec72d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > [ 178.896894] RAX: ffffffffffffffda RBX: 000055f897a44b90 RCX: 00007f2ccb455ca7 > [ 178.896897] RDX: 00007ffcabec74a0 RSI: 0000000040406469 RDI: 0000000000000003 > [ 178.896900] RBP: 00007f2ccb70a440 R08: 00007f2ccb70d0a4 R09: 0000000000000000 > [ 178.896903] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > [ 178.896905] R13: 000055f89782d71a R14: 00007ffcabecf838 R15: 0000000000000003 > [ 178.896908] Code: 00 31 d2 4c 89 ef 8d 70 48 41 ff 95 f8 06 00 00 e9 68 fe ff ff be 0f 00 00 00 48 c7 c7 48 dc 37 a0 e8 fa 33 d6 e0 e9 0b ff ff ff <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 > > On the other hand, by ignoring the interrupt do we risk running out of > space in CSB ring? Testing for a few hours suggests not, i.e. that we > only seem to get the odd delayed CSB idle notification. The alternative is not to report upon engine->irq_posted from within intel_engine_is_idle() but that loses us an important safety check for e.g. suspend. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx