[PATCH] drm/amd/display: avoid sleeping in atomic context while creating new context or state

harry.wentland@xxxxxxx (Harry Wentland) · Thu, 7 Jun 2018 10:46:59 -0400

On 2018-06-07 10:44 AM, Michel DÃ¤nzer wrote:
> On 2018-05-31 08:05 PM, Christian KÃ¶nig wrote:
>> Am 30.05.2018 um 18:03 schrieb Harry Wentland:
>>> On 2018-05-30 06:17 AM, Shirish S wrote:
>>>> This patch fixes the warning messages that are caused due to calling
>>>> sleep in atomic context as below:
>>>>
>>>> BUG: sleeping function called from invalid context at mm/slab.h:419
>>>> in_atomic(): 1, irqs_disabled(): 1, pid: 5, name: kworker/u4:0
>>>> CPU: 1 PID: 5 Comm: kworker/u4:0 Tainted: GÂ Â Â Â Â Â Â  WÂ Â Â Â Â Â  4.14.35 #941
>>>> Workqueue: events_unbound commit_work
>>>> Call Trace:
>>>> Â  dump_stack+0x4d/0x63
>>>> Â  ___might_sleep+0x11f/0x12e
>>>> Â  kmem_cache_alloc_trace+0x41/0xea
>>>> Â  dc_create_state+0x1f/0x30
>>>> Â  dc_commit_updates_for_stream+0x73/0x4cf
>>>> Â  ? amdgpu_get_crtc_scanoutpos+0x82/0x16b
>>>> Â  amdgpu_dm_do_flip+0x239/0x298
>>>> Â  amdgpu_dm_commit_planes.isra.23+0x379/0x54b
>>>> Â  ? dc_commit_state+0x3da/0x404
>>>> Â  amdgpu_dm_atomic_commit_tail+0x4fc/0x5d2
>>>> Â  ? wait_for_common+0x5b/0x69
>>>> Â  commit_tail+0x42/0x64
>>>> Â  process_one_work+0x1b0/0x314
>>>> Â  worker_thread+0x1cb/0x2c1
>>>> Â  ? create_worker+0x1da/0x1da
>>>> Â  kthread+0x156/0x15e
>>>> Â  ? kthread_flush_work+0xea/0xea
>>>> Â  ret_from_fork+0x22/0x40
>>>>
>>>> Signed-off-by: Shirish S <shirish.s at amd.com>
>>>> ---
>>>> Â  drivers/gpu/drm/amd/display/dc/core/dc.c | 4 ++--
>>>> Â  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c
>>>> b/drivers/gpu/drm/amd/display/dc/core/dc.c
>>>> index 33149ed..d62206f 100644
>>>> --- a/drivers/gpu/drm/amd/display/dc/core/dc.c
>>>> +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
>>>> @@ -588,7 +588,7 @@ static void disable_dangling_plane(struct dc *dc,
>>>> struct dc_state *context)
>>>> Â  Â  struct dc *dc_create(const struct dc_init_data *init_params)
>>>> Â Â  {
>>>> -Â Â Â  struct dc *dc = kzalloc(sizeof(*dc), GFP_KERNEL);
>>>> +Â Â Â  struct dc *dc = kzalloc(sizeof(*dc), GFP_ATOMIC);
>>> Are you sure this one can be called in atomic_context?
>>>
>>> If so then everything in consstruct() would also need GFP_ATOMIC.
>>
>> Well the backtrace is quite obvious, but I agree that change still looks
>> fishy to me as well.
>>
>> Using GFP_ATOMIC should only be a last resort when nothing else helps,
>> but here it looks more like we misuse a spinlock where a mutex or
>> semaphore would be more appropriate.
> 
> I want to raise this concern again, for all of these patches.
> 
> I'm now seeing similar BUG messages triggered from VCE functions, see an
> example below. I've never seen these before today, so I assume they're
> caused by the "drm/amdgpu/pp: replace mutex with spin_lock (V3)" change.
> 
> Now if we just mechanically convert the mutex to a spinlock whenever
> this happens, it could be a rat's tail, and we might end up converting a
> lot of mutexes to spinlocks. This could be bad if any of those locks can
> be held for significant amounts of time.
> 
> Instead, we should look into why we end up in atomic context. All of the
> messages in these patches have been triggered from either a worker
> thread or an ioctl, neither of which run in atomic context per se. Most
> likely we end up in atomic context because a spinlock is held, so maybe
> it can be solved by converting those spinlocks to mutexes instead?
> 

I fully agree with Michel.

Shirish, please follow up on this.

Harry

> 
> [ 6232.096387] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
> [ 6232.099544] in_atomic(): 1, irqs_disabled(): 0, pid: 14785, name: kworker/3:14
> [ 6232.102183] INFO: lockdep is turned off.
> [ 6232.104965] CPU: 3 PID: 14785 Comm: kworker/3:14 Tainted: G    B D W  OE    4.16.0-rc7+ #104
> [ 6232.107835] Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 TOMAHAWK (MS-7A34), BIOS 1.80 09/13/2017
> [ 6232.110878] Workqueue: events amdgpu_vce_idle_work_handler [amdgpu]
> [ 6232.113457] Call Trace:
> [ 6232.116065]  dump_stack+0x85/0xc1
> [ 6232.117880]  ___might_sleep+0x28a/0x3c0
> [ 6232.119770]  __mutex_lock+0xc4/0x1520
> [ 6232.121756]  ? vce_v3_0_stop+0x3a/0x170 [amdgpu]
> [ 6232.123603]  ? mutex_lock_io_nested+0x13e0/0x13e0
> [ 6232.125447]  ? debug_check_no_locks_freed+0x2c0/0x2c0
> [ 6232.127333]  ? amdgpu_dpm_enable_vce+0x21a/0x330 [amdgpu]
> [ 6232.129253]  ? __mutex_lock+0xf9/0x1520
> [ 6232.131251]  ? amdgpu_dpm_enable_vce+0x21a/0x330 [amdgpu]
> [ 6232.133324]  ? pick_next_task_fair+0x550/0x1720
> [ 6232.135178]  ? amdgpu_dpm_enable_vce+0x21a/0x330 [amdgpu]
> [ 6232.137078]  ? vce_v3_0_stop+0x3a/0x170 [amdgpu]
> [ 6232.138948]  vce_v3_0_stop+0x3a/0x170 [amdgpu]
> [ 6232.140687]  amdgpu_device_ip_set_powergating_state+0x150/0x2f0 [amdgpu]
> [ 6232.142694]  smu7_powergate_vce+0x11d/0x1c0 [amdgpu]
> [ 6232.144336]  pp_dpm_powergate_vce+0xf4/0x160 [amdgpu]
> [ 6232.146283]  ? pp_set_clockgating_by_smu+0xe0/0xe0 [amdgpu]
> [ 6232.148007]  amdgpu_dpm_enable_vce+0x298/0x330 [amdgpu]
> [ 6232.149815]  process_one_work+0x715/0x1570
> [ 6232.151547]  ? pwq_dec_nr_in_flight+0x2b0/0x2b0
> [ 6232.153193]  ? lock_acquire+0x10b/0x350 
> [ 6232.155001]  worker_thread+0xd4/0xef0 
> [ 6232.156758]  ? process_one_work+0x1570/0x1570
> [ 6232.158595]  kthread+0x311/0x3d0
> [ 6232.160049]  ? kthread_create_worker_on_cpu+0xc0/0xc0
> [ 6232.161677]  ret_from_fork+0x27/0x50
> 
> 
>