On Thu, 2020-07-02 at 19:49 +0200, Daniel Lezcano wrote: > On 02/07/2020 19:19, Andrzej Pietrasiewicz wrote: > > Hi, > > > > W dniu 02.07.2020 o 19:01, Daniel Lezcano pisze: > > > On 02/07/2020 15:53, Andrzej Pietrasiewicz wrote: > > > > Hi Daniel, > > > > > > > > <snip> > > > > > > > > > > > > > > > > > > > > > > I did reproduce: > > > > > > > > > > > > > > > > > > v5.8-rc3 + series => imx6 hang at boot time > > > > > > > > > v5.8-rc3 => imx6 boots correctly > > > > > > > > > > So finally I succeeded to reproduce it on my imx7 locally. > > > > > The sensor > > > > > was failing to initialize for another reason related to the > > > > > legacy > > > > > cooling device, this is why it is not appearing on the imx7. > > > > > > > > > > I can now git-bisect :) > > > > > > > > > > > > > That would be very kind of you, thank you! > > > > > > With the lock correctness option enabled: > > > > > > [ 4.179223] imx_thermal tempmon: Extended Commercial CPU > > > temperature > > > grade - max:105C critical:100C passive:95C > > > [ 4.189557] > > > [ 4.191060] ============================================ > > > [ 4.196378] WARNING: possible recursive locking detected > > > [ 4.201699] 5.8.0-rc3-00011-gf5e50bf4d3ef #42 Not tainted > > > [ 4.207102] -------------------------------------------- > > > [ 4.212421] kworker/0:3/54 is trying to acquire lock: > > > [ 4.217480] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at: > > > thermal_zone_device_is_enabled+0x18/0x34 > > > [ 4.225777] > > > [ 4.225777] but task is already holding lock: > > > [ 4.231615] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at: > > > thermal_zone_get_temp+0x38/0x6c > > > [ 4.239121] > > > [ 4.239121] other info that might help us debug this: > > > [ 4.245655] Possible unsafe locking scenario: > > > [ 4.245655] > > > [ 4.251579] CPU0 > > > [ 4.254031] ---- > > > [ 4.256481] lock(&tz->lock); > > > [ 4.259544] lock(&tz->lock); > > > [ 4.262608] > > > [ 4.262608] *** DEADLOCK *** > > > [ 4.262608] > > > [ 4.268533] May be due to missing lock nesting notation > > > [ 4.268533] > > > [ 4.275329] 4 locks held by kworker/0:3/54: > > > [ 4.279517] #0: cb0066a8 ((wq_completion)events){+.+.}-{0:0}, > > > at: > > > process_one_work+0x224/0x808 > > > [ 4.288241] #1: ca075f10 (deferred_probe_work){+.+.}-{0:0}, > > > at: > > > process_one_work+0x224/0x808 > > > [ 4.296787] #2: cb1a48d8 (&dev->mutex){....}-{3:3}, at: > > > __device_attach+0x30/0x140 > > > [ 4.304468] #3: ca09a3e4 (&tz->lock){+.+.}-{3:3}, at: > > > thermal_zone_get_temp+0x38/0x6c > > > [ 4.312408] > > > [ 4.312408] stack backtrace: > > > [ 4.316778] CPU: 0 PID: 54 Comm: kworker/0:3 Not tainted > > > 5.8.0-rc3-00011-gf5e50bf4d3ef #42 > > > [ 4.325048] Hardware name: Freescale i.MX7 Dual (Device Tree) > > > [ 4.330809] Workqueue: events deferred_probe_work_func > > > [ 4.335973] [<c0312384>] (unwind_backtrace) from [<c030c580>] > > > (show_stack+0x10/0x14) > > > [ 4.343734] [<c030c580>] (show_stack) from [<c079d7d8>] > > > (dump_stack+0xe8/0x114) > > > [ 4.351062] [<c079d7d8>] (dump_stack) from [<c03abf78>] > > > (__lock_acquire+0xbfc/0x2cb4) > > > [ 4.358909] [<c03abf78>] (__lock_acquire) from [<c03ae9c4>] > > > (lock_acquire+0xf4/0x4e4) > > > [ 4.366758] [<c03ae9c4>] (lock_acquire) from [<c10630fc>] > > > (__mutex_lock+0xb0/0xaa8) > > > [ 4.374431] [<c10630fc>] (__mutex_lock) from [<c1063b10>] > > > (mutex_lock_nested+0x1c/0x24) > > > [ 4.382452] [<c1063b10>] (mutex_lock_nested) from [<c0d932c0>] > > > (thermal_zone_device_is_enabled+0x18/0x34) > > > [ 4.392036] [<c0d932c0>] (thermal_zone_device_is_enabled) from > > > [<c0d9da90>] (imx_get_temp+0x30/0x208) > > > [ 4.401271] [<c0d9da90>] (imx_get_temp) from [<c0d97484>] > > > (thermal_zone_get_temp+0x4c/0x6c) > > > [ 4.409640] [<c0d97484>] (thermal_zone_get_temp) from > > > [<c0d93df0>] > > > (thermal_zone_device_update+0x8c/0x258) > > > [ 4.419310] [<c0d93df0>] (thermal_zone_device_update) from > > > [<c0d9401c>] (thermal_zone_device_set_mode+0x60/0x88) > > > [ 4.429500] [<c0d9401c>] (thermal_zone_device_set_mode) from > > > [<c0d9e1d4>] (imx_thermal_probe+0x3e4/0x578) > > > [ 4.439082] [<c0d9e1d4>] (imx_thermal_probe) from [<c0a78388>] > > > (platform_drv_probe+0x48/0x98) > > > [ 4.447622] [<c0a78388>] (platform_drv_probe) from > > > [<c0a7603c>] > > > (really_probe+0x218/0x348) > > > [ 4.455903] [<c0a7603c>] (really_probe) from [<c0a76278>] > > > (driver_probe_device+0x5c/0xb4) > > > [ 4.464098] [<c0a76278>] (driver_probe_device) from > > > [<c0a743bc>] > > > (bus_for_each_drv+0x58/0xb8) > > > [ 4.472638] [<c0a743bc>] (bus_for_each_drv) from [<c0a75db0>] > > > (__device_attach+0xd4/0x140) > > > [ 4.480919] [<c0a75db0>] (__device_attach) from [<c0a750b0>] > > > (bus_probe_device+0x88/0x90) > > > [ 4.489112] [<c0a750b0>] (bus_probe_device) from [<c0a75564>] > > > (deferred_probe_work_func+0x68/0x98) > > > [ 4.498088] [<c0a75564>] (deferred_probe_work_func) from > > > [<c0369988>] > > > (process_one_work+0x2e0/0x808) > > > [ 4.507237] [<c0369988>] (process_one_work) from [<c036a150>] > > > (worker_thread+0x2a0/0x59c) > > > [ 4.515432] [<c036a150>] (worker_thread) from [<c0372208>] > > > (kthread+0x16c/0x178) > > > [ 4.522843] [<c0372208>] (kthread) from [<c0300174>] > > > (ret_from_fork+0x14/0x20) > > > [ 4.530074] Exception stack(0xca075fb0 to 0xca075ff8) > > > [ 4.535138] 5fa0: 00000000 > > > 00000000 00000000 00000000 > > > [ 4.543328] 5fc0: 00000000 00000000 00000000 00000000 00000000 > > > 00000000 00000000 00000000 > > > [ 4.551516] 5fe0: 00000000 00000000 00000000 00000000 00000013 > > > 00000000 > > > > > > > > > > > > > Thanks! > > > > That confirms your suspicions. > > > > So the reason is that ->get_temp() is called while the mutex is > > held and > > thermal_zone_device_is_enabled() wants to take the same mutex. > > Yes, that's correct. > > > Is adding a comment to thermal_zone_device_is_enabled() to never > > call > > it while the mutex is held and adding another version of it which > > does > > not take the mutex ok? > > The thermal_zone_device_is_enabled() is only used in two places, acpi > and this imx driver, and given: > > 1. as soon as the mutex is released, there is no guarantee the > thermal > zone won't be changed right after, the lock is pointless, thus the > information also. > > 2. from a design point of view, I don't see why a driver should know > if > a thermal zone is disabled or not > > It would make sense to end with this function and do not give the > different drivers an opportunity to access this information. I agree. > > Why not add change_mode for the acpi in order to enable or disable > the > events thermal_zone_device_is_enabled() is invoked in acpi thermal driver because we only want to do thermal_zone_device_update() when the acpi thermal zone is enabled. As thermal_zone_device_update() can handle a disabled thermal zone now, we can just remove the check. thanks, rui > and for imx_thermal use irq_enabled flag instead of the thermal > zone mode? Moreover it is very unclear why this function is needed in > imx_get_temp(), and I suspect we should be able to get rid of it. > >