Quoting Bruce Chang (2019-11-13 23:11:04) > below is the call trace when this issue is hit > > <3> [113.316247] BUG: sleeping function called from invalid context at mm/page_alloc.c:4653 > <3> [113.318190] in_atomic(): 1, irqs_disabled(): 0, pid: 678, name: debugfs_test > <4> [113.319900] no locks held by debugfs_test/678. > <3> [113.321002] Preemption disabled at: > <4> [113.321130] [<ffffffffa02506d4>] i915_error_object_create+0x494/0x610 [i915] > <4> [113.327259] Call Trace: > <4> [113.327871] dump_stack+0x67/0x9b > <4> [113.328683] ___might_sleep+0x167/0x250 > <4> [113.329618] __alloc_pages_nodemask+0x26b/0x1110 > <4> [113.330731] ? ___slab_alloc.constprop.34+0x21c/0x380 > <4> [113.331943] ? ___slab_alloc.constprop.34+0x21c/0x380 > <4> [113.333169] ? __slab_alloc.isra.28.constprop.33+0x4d/0x70 > <4> [113.334614] pool_alloc.constprop.19+0x14/0x60 [i915] > <4> [113.335951] compress_page+0x7c/0x100 [i915] > <4> [113.337110] i915_error_object_create+0x4bd/0x610 [i915] > <4> [113.338515] i915_capture_gpu_state+0x384/0x1680 [i915] > <4> [113.339771] ? __lock_acquire+0x4ac/0x1e90 > <4> [113.340785] ? _raw_spin_lock_irqsave_nested+0x1/0x50 > <4> [113.342127] i915_gpu_info_open+0x44/0x70 [i915] > <4> [113.343243] full_proxy_open+0x139/0x1b0 > <4> [113.344196] ? open_proxy_open+0xc0/0xc0 > <4> [113.345149] do_dentry_open+0x1ce/0x3a0 > <4> [113.346084] path_openat+0x4c9/0xac0 > <4> [113.346967] do_filp_open+0x96/0x110 > <4> [113.347848] ? __alloc_fd+0xe0/0x1f0 > <4> [113.348736] ? do_sys_open+0x1b8/0x250 > <4> [113.349647] do_sys_open+0x1b8/0x250 > <4> [113.350526] do_syscall_64+0x55/0x1c0 > <4> [113.351418] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > After the io_mapping_map_atomic_wc/kmap_atomic, the kernel enters atomic context > but after that, compress_page calls pool_alloc with GFP_KERNEL flag which can > potentially go to sleep. When the kernel is in atomic context, sleeping is not > allowed. This is why this bug got triggered. The last 2 sentences are redundant. > In order to fix this issue, we either > 1) not enter into atomic context, i.e., to use non atomic version of > functions like io_mapping_map_wc/kmap, > or > 2) make compress_page run in atomic context. > > But it is not a good idea to run slow compression in atomic context, so, > 1) above is preferred solution which is the implementation of this patch. Reasonable, though we have and may have to do capture inside atomic again. (Dropping the atomicity is a recent change that has a surprising amount of controversy.) > Signed-off-by: Bruce Chang <yu.bruce.chang@xxxxxxxxx> > Reviewed-by: Brian Welty <brian.welty@xxxxxxxxx> > Fixes: 895d8ebeaa924 ("drm/i915: error capture with no ggtt slot") Reviewed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx