On 2018-06-07 10:44 AM, Michel Dänzer wrote: > On 2018-05-31 08:05 PM, Christian König wrote: >> Am 30.05.2018 um 18:03 schrieb Harry Wentland: >>> On 2018-05-30 06:17 AM, Shirish S wrote: >>>> This patch fixes the warning messages that are caused due to calling >>>> sleep in atomic context as below: >>>> >>>> BUG: sleeping function called from invalid context at mm/slab.h:419 >>>> in_atomic(): 1, irqs_disabled(): 1, pid: 5, name: kworker/u4:0 >>>> CPU: 1 PID: 5 Comm: kworker/u4:0 Tainted: G       W      4.14.35 #941 >>>> Workqueue: events_unbound commit_work >>>> Call Trace: >>>>  dump_stack+0x4d/0x63 >>>>  ___might_sleep+0x11f/0x12e >>>>  kmem_cache_alloc_trace+0x41/0xea >>>>  dc_create_state+0x1f/0x30 >>>>  dc_commit_updates_for_stream+0x73/0x4cf >>>>  ? amdgpu_get_crtc_scanoutpos+0x82/0x16b >>>>  amdgpu_dm_do_flip+0x239/0x298 >>>>  amdgpu_dm_commit_planes.isra.23+0x379/0x54b >>>>  ? dc_commit_state+0x3da/0x404 >>>>  amdgpu_dm_atomic_commit_tail+0x4fc/0x5d2 >>>>  ? wait_for_common+0x5b/0x69 >>>>  commit_tail+0x42/0x64 >>>>  process_one_work+0x1b0/0x314 >>>>  worker_thread+0x1cb/0x2c1 >>>>  ? create_worker+0x1da/0x1da >>>>  kthread+0x156/0x15e >>>>  ? kthread_flush_work+0xea/0xea >>>>  ret_from_fork+0x22/0x40 >>>> >>>> Signed-off-by: Shirish S <shirish.s at amd.com> >>>> --- >>>>  drivers/gpu/drm/amd/display/dc/core/dc.c | 4 ++-- >>>>  1 file changed, 2 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c >>>> b/drivers/gpu/drm/amd/display/dc/core/dc.c >>>> index 33149ed..d62206f 100644 >>>> --- a/drivers/gpu/drm/amd/display/dc/core/dc.c >>>> +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c >>>> @@ -588,7 +588,7 @@ static void disable_dangling_plane(struct dc *dc, >>>> struct dc_state *context) >>>>   struct dc *dc_create(const struct dc_init_data *init_params) >>>>   { >>>> -   struct dc *dc = kzalloc(sizeof(*dc), GFP_KERNEL); >>>> +   struct dc *dc = kzalloc(sizeof(*dc), GFP_ATOMIC); >>> Are you sure this one can be called in atomic_context? >>> >>> If so then everything in consstruct() would also need GFP_ATOMIC. >> >> Well the backtrace is quite obvious, but I agree that change still looks >> fishy to me as well. >> >> Using GFP_ATOMIC should only be a last resort when nothing else helps, >> but here it looks more like we misuse a spinlock where a mutex or >> semaphore would be more appropriate. > > I want to raise this concern again, for all of these patches. > > I'm now seeing similar BUG messages triggered from VCE functions, see an > example below. I've never seen these before today, so I assume they're > caused by the "drm/amdgpu/pp: replace mutex with spin_lock (V3)" change. > > Now if we just mechanically convert the mutex to a spinlock whenever > this happens, it could be a rat's tail, and we might end up converting a > lot of mutexes to spinlocks. This could be bad if any of those locks can > be held for significant amounts of time. > > Instead, we should look into why we end up in atomic context. All of the > messages in these patches have been triggered from either a worker > thread or an ioctl, neither of which run in atomic context per se. Most > likely we end up in atomic context because a spinlock is held, so maybe > it can be solved by converting those spinlocks to mutexes instead? > I fully agree with Michel. Shirish, please follow up on this. Harry > > [ 6232.096387] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747 > [ 6232.099544] in_atomic(): 1, irqs_disabled(): 0, pid: 14785, name: kworker/3:14 > [ 6232.102183] INFO: lockdep is turned off. > [ 6232.104965] CPU: 3 PID: 14785 Comm: kworker/3:14 Tainted: G B D W OE 4.16.0-rc7+ #104 > [ 6232.107835] Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 TOMAHAWK (MS-7A34), BIOS 1.80 09/13/2017 > [ 6232.110878] Workqueue: events amdgpu_vce_idle_work_handler [amdgpu] > [ 6232.113457] Call Trace: > [ 6232.116065] dump_stack+0x85/0xc1 > [ 6232.117880] ___might_sleep+0x28a/0x3c0 > [ 6232.119770] __mutex_lock+0xc4/0x1520 > [ 6232.121756] ? vce_v3_0_stop+0x3a/0x170 [amdgpu] > [ 6232.123603] ? mutex_lock_io_nested+0x13e0/0x13e0 > [ 6232.125447] ? debug_check_no_locks_freed+0x2c0/0x2c0 > [ 6232.127333] ? amdgpu_dpm_enable_vce+0x21a/0x330 [amdgpu] > [ 6232.129253] ? __mutex_lock+0xf9/0x1520 > [ 6232.131251] ? amdgpu_dpm_enable_vce+0x21a/0x330 [amdgpu] > [ 6232.133324] ? pick_next_task_fair+0x550/0x1720 > [ 6232.135178] ? amdgpu_dpm_enable_vce+0x21a/0x330 [amdgpu] > [ 6232.137078] ? vce_v3_0_stop+0x3a/0x170 [amdgpu] > [ 6232.138948] vce_v3_0_stop+0x3a/0x170 [amdgpu] > [ 6232.140687] amdgpu_device_ip_set_powergating_state+0x150/0x2f0 [amdgpu] > [ 6232.142694] smu7_powergate_vce+0x11d/0x1c0 [amdgpu] > [ 6232.144336] pp_dpm_powergate_vce+0xf4/0x160 [amdgpu] > [ 6232.146283] ? pp_set_clockgating_by_smu+0xe0/0xe0 [amdgpu] > [ 6232.148007] amdgpu_dpm_enable_vce+0x298/0x330 [amdgpu] > [ 6232.149815] process_one_work+0x715/0x1570 > [ 6232.151547] ? pwq_dec_nr_in_flight+0x2b0/0x2b0 > [ 6232.153193] ? lock_acquire+0x10b/0x350 > [ 6232.155001] worker_thread+0xd4/0xef0 > [ 6232.156758] ? process_one_work+0x1570/0x1570 > [ 6232.158595] kthread+0x311/0x3d0 > [ 6232.160049] ? kthread_create_worker_on_cpu+0xc0/0xc0 > [ 6232.161677] ret_from_fork+0x27/0x50 > > >