转发: 回复: [PATCH] drm/komeda: drop all currently held locks if deadlock happens

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




-----邮件原件-----
发件人: Liu Lucas/刘保柱 
发送时间: 2023年8月1日 16:59
收件人: 'Liviu Dudau' <liviu.dudau@xxxxxxx>
抄送: airlied@xxxxxxxxx; daniel@xxxxxxxx; Huang Menghui/黄梦辉 <menghui.huang@xxxxxxxxxxxx>
主题: 回复: 回复: [PATCH] drm/komeda: drop all currently held locks if deadlock happens

Ok , later, I will send a new patch.

-----邮件原件-----
发件人: Liviu Dudau <liviu.dudau@xxxxxxx>
发送时间: 2023年8月1日 16:38
收件人: Liu Lucas/刘保柱 <lucas.liu@xxxxxxxxxxxx>
抄送: airlied@xxxxxxxxx; daniel@xxxxxxxx; Huang Menghui/黄梦辉 <menghui.huang@xxxxxxxxxxxx>
主题: Re: 回复: [PATCH] drm/komeda: drop all currently held locks if deadlock happens

On Tue, Aug 01, 2023 at 02:16:46AM +0000, Liu Lucas/刘保柱 wrote:
> Hello,

Hi Liu,

> 
> I'm sorry that the previous modification was not explained in detail.
> (1). We are doing a deadlock detection on the kernel, so turn on the debug option for locks:
> CONFIG_LOCKDEP=y
> CONFIG_PROVE_LOCKING=y
> CONFIG_LOCK_STAT=y
> CONFIG_DEBUG_RT_MUTEXES=y
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_MUTEXES=y
> CONFIG_DEBUG_RWSEMS=y
> CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
> CONFIG_DEBUG_LOCK_ALLOC=y
> CONFIG_DEBUG_ATOMIC_SLEEP=y
> CONFIG_PROVE_RAW_LOCK_NESTING=y
> 
> Problems with the CONFIG_DEBUG_WW_MUTEX_SLOWPATH option were found:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 345 at
> drivers/gpu/drm/arm/display/komeda/komeda_pipeline_state.c:1248
> komeda_release_unclaimed_resources+0x13c/0x170
> Modules linked in:
> CPU: 2 PID: 345 Comm: composer@2.1-se Kdump: loaded Tainted: G        W         5.10.110-SE-SDK1.8-dirty #16
> Hardware name: Siengine Se1000 Evaluation board (DT)
> pstate: 20400009 (nzCv daif +PAN -UAO -TCO BTYPE=--) pc : 
> komeda_release_unclaimed_resources+0x13c/0x170
> lr : komeda_release_unclaimed_resources+0xbc/0x170
> sp : ffff800017b8b8d0
> pmr_save: 000000e0
> x29: ffff800017b8b8d0 x28: ffff000cf2f96200
> x27: ffff000c8f5a8800 x26: 0000000000000000
> x25: 0000000000000038 x24: ffff8000116a0140
> x23: 0000000000000038 x22: ffff000cf2f96200
> x21: ffff000cfc300300 x20: ffff000c8ab77080
> x19: 0000000000000003 x18: 0000000000000000
> x17: 0000000000000000 x16: 0000000000000000
> x15: b400e638f738ba38 x14: 0000000000000000
> x13: 0000000106400a00 x12: 0000000000000000
> x11: 0000000000000000 x10: 0000000000000000
> x9 : ffff800012f80000 x8 : ffff000ca3308000
> x7 : 0000000ff3000000 x6 : ffff80001084034c
> x5 : ffff800017b8bc40 x4 : 000000000000000f
> x3 : ffff000ca3308000 x2 : 0000000000000000
> x1 : 0000000000000000 x0 : ffffffffffffffdd Call trace:
> komeda_release_unclaimed_resources+0x13c/0x170
> komeda_crtc_atomic_check+0x68/0xf0
> drm_atomic_helper_check_planes+0x138/0x1f4
> komeda_kms_check+0x284/0x36c
> drm_atomic_check_only+0x40c/0x714
> drm_atomic_nonblocking_commit+0x1c/0x60
> drm_mode_atomic_ioctl+0xa3c/0xb8c
> drm_ioctl_kernel+0xc4/0x120
> drm_ioctl+0x268/0x534
> __arm64_sys_ioctl+0xa8/0xf0
> el0_svc_common.constprop.0+0x80/0x240
> do_el0_svc+0x24/0x90
> el0_svc+0x20/0x30
> el0_sync_handler+0xe8/0xf0
> el0_sync+0x1a4/0x1c0
> irq event stamp: 0
> hardirqs last  enabled at (0): [<0000000000000000>] 0x0 hardirqs last 
> disabled at (0): [<ffff800010056d34>] copy_process+0x5d0/0x183c 
> softirqs last  enabled at (0): [<ffff800010056d34>] 
> copy_process+0x5d0/0x183c softirqs last disabled at (0):
> [<0000000000000000>] 0x0 ---[ end trace 20ae984fa860184a ]--- 
> ------------[ cut here ]------------
> WARNING: CPU: 3 PID: 345 at drivers/gpu/drm/drm_modeset_lock.c:228
> drm_modeset_drop_locks+0x84/0x90 Modules linked in:
> CPU: 3 PID: 345 Comm: composer@2.1-se Kdump: loaded Tainted: G        W         5.10.110-SE-SDK1.8-dirty #16
> Hardware name: Siengine Se1000 Evaluation board (DT)
> pstate: 20400009 (nzCv daif +PAN -UAO -TCO BTYPE=--) pc : 
> drm_modeset_drop_locks+0x84/0x90 lr : 
> drm_mode_atomic_ioctl+0x860/0xb8c sp : ffff800017b8bb10
> pmr_save: 000000e0
> x29: ffff800017b8bb10 x28: 0000000000000001
> x27: 0000000000000038 x26: 0000000000000002
> x25: ffff000cecbefa00 x24: ffff000cf2f96200
> x23: 0000000000000001 x22: 0000000000000018
> x21: 0000000000000001 x20: ffff800017b8bc10
> x19: 0000000000000000 x18: 0000000000000000
> x17: 0000000002e8bf2c x16: 0000000002e94c6b
> x15: 0000000002ea48b9 x14: ffff8000121f0300
> x13: 0000000002ee2ca8 x12: ffff80001129cae0
> x11: ffff800012435000 x10: ffff000ed46b5e88
> x9 : ffff000c9935e600 x8 : 0000000000000000
> x7 : 000000008020001e x6 : 000000008020001f
> x5 : ffff80001085fbe0 x4 : fffffe0033a59f20
> x3 : 000000008020001e x2 : 0000000000000000
> x1 : 0000000000000000 x0 : ffff000c8f596090 Call trace:
> drm_modeset_drop_locks+0x84/0x90
> drm_mode_atomic_ioctl+0x860/0xb8c
> drm_ioctl_kernel+0xc4/0x120
> drm_ioctl+0x268/0x534
> __arm64_sys_ioctl+0xa8/0xf0
> el0_svc_common.constprop.0+0x80/0x240
> do_el0_svc+0x24/0x90
> el0_svc+0x20/0x30
> el0_sync_handler+0xe8/0xf0
> el0_sync+0x1a4/0x1c0
> irq event stamp: 0
> hardirqs last  enabled at (0): [<0000000000000000>] 0x0 hardirqs last 
> disabled at (0): [<ffff800010056d34>] copy_process+0x5d0/0x183c 
> softirqs last  enabled at (0): [<ffff800010056d34>] 
> copy_process+0x5d0/0x183c softirqs last disabled at (0):
> [<0000000000000000>] 0x0 ---[ end trace 20ae984fa860184b ]---
> 
> (2). According to the call trace information, it can be located to be WARN_ON(IS_ERR(c_st)) in the komeda_pipeline_unbound_components function; Then follow the function.
> komeda_pipeline_unbound_components
> -> komeda_component_get_state_and_set_user
>   -> komeda_pipeline_get_state_and_set_crtc
>     -> komeda_pipeline_get_state
>       ->drm_atomic_get_private_obj_state
>         -> drm_atomic_get_private_obj_state
>           -> drm_modeset_lock
> 
> 
> komeda_pipeline_unbound_components
> -> komeda_component_get_state_and_set_user
>   -> komeda_component_get_state
>     -> drm_atomic_get_private_obj_state
>      -> drm_modeset_lock
> 
> ret = drm_modeset_lock(&obj->lock, state->acquire_ctx); if (ret)
> 	return ERR_PTR(ret);
> Here it return -EDEADLK.
> 
> (3). Therefore, deal with the deadlock as suggested by [1], using the
>     function drm_modeset_backoff().
>     [1]
> https://docs.kernel.org/gpu/drm-kms.html?highlight=kms#kms-locking
> 
> According to the call trace information:
> Call trace:
> komeda_release_unclaimed_resources+0x13c/0x170
> komeda_crtc_atomic_check+0x68/0xf0
> drm_atomic_helper_check_planes+0x138/0x1f4
> komeda_kms_check+0x284/0x36c
> drm_atomic_check_only+0x40c/0x714
> drm_atomic_nonblocking_commit+0x1c/0x60
> drm_mode_atomic_ioctl+0xa3c/0xb8c
> drm_ioctl_kernel+0xc4/0x120
> drm_ioctl+0x268/0x534

This is a much better description of the problem that contains the relevant information for me. Can you please send a new revision of the patch where the commit message is the text above? With that, you can have my Reviewed-by tag.

Best regards,
Liviu


> 
> Add a determination of the return value to the function komeda_pipeline_unbound_components, Finally, komeda_release_unclaimed_resources returns to the drm_mode_atomic_ioctl function to call drm_modeset_backoff based on the value of ret.
> 
> (4). WARN_ON(IS_ERR(c_st)); This code can be retained, the modification was not carefully considered at that time, I will modify this patch again later and submit a new one.
>                 c_st = komeda_component_get_state_and_set_user(c,
>                                 drm_st, NULL, new->crtc);
> -               WARN_ON(IS_ERR(c_st));
> +               if (PTR_ERR(c_st) == -EDEADLK)
> +                       return -EDEADLK;
> +               else
> +                       WARN_ON(IS_ERR(c_st));
> 
> -----邮件原件-----
> 发件人: Liviu Dudau <liviu.dudau@xxxxxxx>
> 发送时间: 2023年7月31日 22:11
> 收件人: Huang Menghui/黄梦辉 <menghui.huang@xxxxxxxxxxxx>
> 抄送: airlied@xxxxxxxxx; daniel@xxxxxxxx; Liu Lucas/刘保柱
> <lucas.liu@xxxxxxxxxxxx>
> 主题: Re: [PATCH] drm/komeda: drop all currently held locks if deadlock 
> happens
> 
> Hello,
> 
> On Mon, Jul 31, 2023 at 04:08:43PM +0800, menghui.huang wrote:
> > From: "baozhu.liu" <lucas.liu@xxxxxxxxxxxx>
> > 
> > If komeda_pipeline_unbound_components() returns -EDEADLK, it means 
> > that a deadlock happened in the locking context.
> > Currently, komeda is not dealing with the deadlock properly, 
> > producing the following output when CONFIG_DEBUG_WW_MUTEX_SLOWPATH is enabled:
> > 
> > ------------[ cut here ]------------
> > WARNING: CPU: 2 PID: 345 at drivers/gpu/drm/drm_modeset_lock.c:228
> > drm_modeset_drop_locks+0x84/0x90 Modules linked in:
> > CPU: 2 PID: 345 Comm: composer@2.1-se Kdump: loaded Tainted: G        W
> > Hardware name: Siengine Se1000 Evaluation board (DT)
> > pstate: 20400009 (nzCv daif +PAN -UAO -TCO BTYPE=--) pc : 
> > drm_modeset_drop_locks+0x84/0x90 lr : 
> > drm_mode_atomic_ioctl+0x860/0xb8c sp : ffff800017b8bb10
> > pmr_save: 000000e0
> > x29: ffff800017b8bb10 x28: 0000000000000001
> > x27: 0000000000000038 x26: 0000000000000002
> > x25: ffff000d03a6cc00 x24: ffff000cf2fe9000
> > x23: 0000000000000001 x22: 0000000000000018
> > x21: 0000000000000001 x20: ffff800017b8bc10
> > x19: 0000000000000000 x18: 0000000000000000
> > x17: 000000000000a2bb x16: 0000000000f77f9a
> > x15: 00000000025a0830 x14: ffff8000121f0300
> > x13: 0000000005740083 x12: ffff80001129cae0
> > x11: ffff800012435000 x10: ffff000ed46b5e88
> > x9 : ffff000c9935e600 x8 : 0000000000000000
> > x7 : 0000000000000001 x6 : 00000000000ba57a
> > x5 : ffff000cf458e400 x4 : ffff000ed4c528a0
> > x3 : 00000000000ba582 x2 : 0000000000000000
> > x1 : 0000000000000000 x0 : ffff000c8f597290 Call trace:
> > drm_modeset_drop_locks+0x84/0x90
> > drm_mode_atomic_ioctl+0x860/0xb8c
> > drm_ioctl_kernel+0xc4/0x120
> > drm_ioctl+0x268/0x534
> > __arm64_sys_ioctl+0xa8/0xf0
> > el0_svc_common.constprop.0+0x80/0x240
> > do_el0_svc+0x24/0x90
> > el0_svc+0x20/0x30
> > el0_sync_handler+0xe8/0xf0
> > el0_sync+0x1a4/0x1c0
> > irq event stamp: 0
> > hardirqs last  enabled at (0): [<0000000000000000>] 0x0 hardirqs 
> > last disabled at (0): [<ffff800010056d34>] copy_process+0x5d0/0x183c 
> > softirqs last  enabled at (0): [<ffff800010056d34>] 
> > copy_process+0x5d0/0x183c softirqs last disabled at (0):
> > [<0000000000000000>] 0x0 ---[ end trace 20ae984fa8601849 ]---
> > 
> > Therefore, handling this deadlock can be solved by adding return 
> > -EDEADLK back to the drm_modeset_backoff processing flow in the 
> > drm_mode_atomic_ioctl function.
> > 
> > Signed-off-by: baozhu.liu <lucas.liu@xxxxxxxxxxxx>
> > ---
> >  .../gpu/drm/arm/display/komeda/komeda_pipeline_state.c | 10
> > ++++++----
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> > 
> > diff --git
> > a/drivers/gpu/drm/arm/display/komeda/komeda_pipeline_state.c
> > b/drivers/gpu/drm/arm/display/komeda/komeda_pipeline_state.c
> > index 3276a3e82c62..8ad021259a37 100644
> > --- a/drivers/gpu/drm/arm/display/komeda/komeda_pipeline_state.c
> > +++ b/drivers/gpu/drm/arm/display/komeda/komeda_pipeline_state.c
> > @@ -1223,7 +1223,7 @@ int komeda_build_display_data_flow(struct komeda_crtc *kcrtc,
> >  	return 0;
> >  }
> >  
> > -static void
> > +static int
> >  komeda_pipeline_unbound_components(struct komeda_pipeline *pipe,
> >  				   struct komeda_pipeline_state *new)  { @@ -1243,8 +1243,11 @@ 
> > komeda_pipeline_unbound_components(struct komeda_pipeline *pipe,
> >  		c = komeda_pipeline_get_component(pipe, id);
> >  		c_st = komeda_component_get_state_and_set_user(c,
> >  				drm_st, NULL, new->crtc);
> > -		WARN_ON(IS_ERR(c_st));
> > +		if (PTR_ERR(c_st) == -EDEADLK)
> > +			return -EDEADLK;
> >  	}
> > +
> > +	return 0;
> >  }
> >  
> >  /* release unclaimed pipeline resource */ @@ -1266,9 +1269,8 @@ int 
> > komeda_release_unclaimed_resources(struct komeda_pipeline *pipe,
> >  	if (WARN_ON(IS_ERR_OR_NULL(st)))
> >  		return -EINVAL;
> >  
> > -	komeda_pipeline_unbound_components(pipe, st);
> > +	return komeda_pipeline_unbound_components(pipe, st);
> >  
> > -	return 0;
> >  }
> >  
> >  /* Since standalone disabled components must be disabled separately 
> > and in the
> > --
> > 2.17.1
> > 
> 
> Thank you for sending this patch, but commit message need a bit more clarity about what you're trying to do and why you think this patch solves it.
> 
> First of all, the commit title talks about dropping locks which is obviously not what the patch is doing here. Second, the patch replaces a WARN_ON() which I would expect to be printed in the log fragment that you have provided, but instead I see the WARNING() from drm_modeset_lock.c:228. I cannot see from the call trace where you've hit a komeda function call, so something is missing.
> 
> Can you also provide us with information on the kernel version you're using?
> For me, line 228 is on an empty line inside
> drm_warn_on_modeset_not_all_locked()
> and drm_modeset_drop_locks() only starts at line 274.
> 
> Best regards,
> Liviu
> 
> 
> --
> ====================
> | I would like to |
> | fix the world,  |
> | but they're not |
> | giving me the   |
>  \ source code!  /
>   ---------------
>     ¯\_(ツ)_/¯

--
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux