Re: Radeon driver hang with 4.9.0-rt

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 06, 2017 at 05:14:43PM +0100, Wargreen wrote:
> Hi,
> As you suggest, i've build the kernel 4.9 rt10 from the debian sources,
> with lockdep.
>
> Here the trace when system hang :
>
> [  339.062919] sysrq: SysRq : Show Locks Held
> [  339.062929]
> [  339.062929] Showing all locks held in the system:
> [  339.062947] 5 locks held by irq/1-i8042/126:
> [  339.062948]  #0: (&serio->lock){+.+...}, at: [<ffffffff8794dab8>] serio_interrupt+0x28/0x80
> [  339.062960]  #1: (&dev->event_lock){+.+...}, at: [<ffffffff87953cba>] input_event+0x3a/0x60
> [  339.062967]  #2: (rcu_read_lock){......}, at: [<ffffffff87952ee5>] input_pass_values.part.5+0x5/0x270
> [  339.062975]  #3: (rcu_read_lock){......}, at: [<ffffffff878c1eb5>] __handle_sysrq+0x5/0x220
> [  339.062982]  #4: (tasklist_lock){+.+...}, at: [<ffffffff874eae1d>] debug_show_all_locks+0x3d/0x1a0
> [  339.063009] 1 lock held by in:imklog/822:
> [  339.063011]  #0: (&f->f_pos_lock){+.+.+.}, at: [<ffffffff8768883a>] __fdget_pos+0x4a/0x50

The above two threads are irrelevant.

> [  339.063040] 2 locks held by Xorg/2365:
> [  339.063041]  #0: (&rdev->exclusive_lock){++++.+}, at: [<ffffffff874f3710>] rt_down_read+0x10/0x20
> [  339.063049]  #1: (&rdev->pm.mclk_lock){++++++}, at: [<ffffffff874f3710>] rt_down_read+0x10/0x20
> [  339.063100] 2 locks held by G.Main/3575:
> [  339.063101]  #0: (&rdev->pm.mclk_lock){++++++}, at: [<ffffffff874f3710>] rt_down_read+0x10/0x20
> [  339.063108]  #1: (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffc087b7b0>] ttm_bo_vm_fault+0x60/0x530 [ttm]

These two are suspicious, though.  Note that what it claimed is that
&rdev->pm.mclk_lock is acquired by both threads.  However:

> [  339.063124]
> [  339.063125] =============================================
> [  339.063125]
> [  363.523950] INFO: task Xorg:2365 blocked for more than 120 seconds.
> [  363.523959]       Not tainted 4.9.13-rt10 #1
> [  363.523961] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  363.523964] Xorg            D    0 2365   2363 0x00000004
> [  363.523970]  0000000000000000 ffff96333582b458 ffff9633027bd580 ffff96333582b440
> [  363.523975]  ffff96332f2b1c80 ffffba8646bffa50 ffffffff87af622f 00ffba8646bffb08
> [  363.523980]  ffff9633027bdbc0 ffff96333582b458 ffff96332841b800 ffff9633027bd580
> [  363.523984] Call Trace:
> [  363.523984] Call Trace:
> [  363.523992]  [<ffffffff87af622f>] ?  __schedule+0x32f/0x800
> [  363.523996]  [<ffffffff87af674c>] schedule+0x4c/0xe0
> [  363.523998]  [<ffffffff87af8146>] __rt_mutex_slowlock+0x56/0x1c0
> [  363.524001]  [<ffffffff87af8834>] rt_mutex_slowlock+0x114/0x360
> [  363.524005]  [<ffffffff87665615>] ?  get_empty_filp+0x175/0x250
> [  363.524007]  [<ffffffff87af8a9a>] rt_mutex_lock+0x1a/0x20
> [  363.524011]  [<ffffffff874f36e3>] __rt_down_read+0x53/0x70
> [  363.524014]  [<ffffffff874f3710>] ?  rt_down_read+0x10/0x20
> [  363.524016]  [<ffffffff874f3710>] rt_down_read+0x10/0x20
> [  363.524043]  [<ffffffffc0ae7d97>] radeon_bo_create+0x157/0x360 [radeon]
> [  363.524064]  [<ffffffffc0afb21d>] radeon_gem_object_create+0xbd/0x1a0 [radeon]
> [  363.524084]  [<ffffffffc0afb6ae>] radeon_gem_create_ioctl+0x6e/0x100 [radeon]
> [  363.524101]  [<ffffffffc081ef6a>] drm_ioctl+0x37a/0x4d0 [drm]
> [  363.524119]  [<ffffffffc0afb640>] ?  radeon_gem_pwrite_ioctl+0x30/0x30 [radeon]
> [  363.524122]  [<ffffffff877de437>] ?  debug_smp_processor_id+0x17/0x20
> [  363.524125]  [<ffffffff8748d8f6>] ?  unpin_current_cpu+0x16/0x70
> [  363.524128]  [<ffffffff874bce47>] ?  migrate_enable+0x87/0x150
> [  363.524131]  [<ffffffff87afb18f>] ?  rt_spin_unlock+0x2f/0x40
> [  363.524145]  [<ffffffffc0ac804f>] radeon_drm_ioctl+0x4f/0x90 [radeon]
> [  363.524148]  [<ffffffff87679fd3>] do_vfs_ioctl+0xa3/0x6f0
> [  363.524152]  [<ffffffff8768789c>] ?  __fget+0x11c/0x210
> [  363.524154]  [<ffffffff87687785>] ?  __fget+0x5/0x210
> [  363.524157]  [<ffffffff8767a699>] SyS_ioctl+0x79/0x90
> [  363.524159]  [<ffffffff87afb845>] system_call_fast_compare_end+0xc/0xaf

It appears that Xorg hasn't actually acquired the rwsem's underlying
rt_mutex, but is in the process of acquiring it, but is suspended
waiting on G.Main/3575 to release it.

> [  363.524263] 2 locks held by G.Main/3575:
> [  363.524264]  #0: (&rdev->pm.mclk_lock){++++++}, at: [<ffffffff874f3710>] rt_down_read+0x10/0x20
> [  363.524270]  #1: (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffc087b7b0>] ttm_bo_vm_fault+0x60/0x530 [ttm]

So, the question is: what is this G.Main/3575 thread, and what is it
doing?  Do you find it spinning?

Could be something fishy with w/w mutexes as well.

   Julia

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux