On 21/05/2021 13:30, Tomi Valkeinen wrote:
On 21/05/2021 12:14, Tony Lindgren wrote:
* Tomi Valkeinen <tomi.valkeinen@xxxxxxxxxxxxxxxx> [210521 08:45]:
On 21/05/2021 10:39, Tony Lindgren wrote:
* Tomi Valkeinen <tomi.valkeinen@xxxxxxxxxxxxxxxx> [210521 07:05]:
On 21/05/2021 08:36, Tony Lindgren wrote:
* Tomi Valkeinen <tomi.valkeinen@xxxxxxxxxxxxxxxx> [210520 08:27]:
Hi,
I've noticed that the v5.13 rcs crash randomly (but quite often)
on dra76 evm
(I haven't tested other boards). Anyone else seen this problem?
I have not seen this so far and beagle-x15 is behaving for me.
Does it always happen on boot?
No, but quite often. I can't really say how often, as it's
annoyingly random.
I tried to bisect, but that proved to be difficult as sometimes I
get multiple (5+)
successful boots before the crash.
I tested with x15, same issue (below). So... Something in my kernel
config? Or compiler?
Looks like the crash happens always very soon after (or during)
probing palmas.
After about 10 reboots with your .config I'm seeing it now too on
beagle-x15. So far no luck reproducing it with omap2plus_defconfig.
I think I have an easy way to see if a kernel is good or bad, by
printing
stack_not_used(current) in the first call to omap_i2c_xfer_irq().
There's a
huge drop between v5.12 and v5.13-rc1.
And interestingly, sometimes a simple printk seems to use hundreds of
bytes
of stack (i.e. compare stack usage before and after the print). But not
always. So maybe the issue is somehow related to printk.
I'm bisecting.
OK sounds good to me.
Well, I found the bad commit but unfortunately it doesn't exactly point
where the issue is.
f483a3e123410bd1c78af295bf65feffb6769a98 is the first bad commit
commit f483a3e123410bd1c78af295bf65feffb6769a98
Author: Tony Lindgren <tony@xxxxxxxxxxx>
Date: Wed Mar 10 14:03:48 2021 +0200
ARM: dts: Configure simple-pm-bus for dra7 l4_per1
We can now probe interconnects with device tree only configuration
using
simple-pm-bus and genpd.
Tested-by: Kishon Vijay Abraham I <kishon@xxxxxx>
Signed-off-by: Tony Lindgren <tony@xxxxxxxxxxx>
arch/arm/boot/dts/dra7-l4.dtsi | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
The difference is clear, though. With
9a75368b6426739e8b798592f084cb682d760568, which is the last good commit,
when I print the stack usage with stack_not_used() in three different
places in omap_i2c_xfer_irq(), I get always prints roughly like:
STACK FREE omap_i2c_xfer_irq: 2972, 2972, 2972
And these repeat exactly the same for each call to omap_i2c_xfer_irq (at
least during palmas probe).
With the bad commit the situation is different. The first call to
omap_i2c_xfer_irq prints:
STACK FREE omap_i2c_xfer_irq: 2024, 2024, 2024
so we're already using 1k more. But then, instead of the stack usage
staying the same, consecutive calls show increased stack usage. It
doesn't increase for each xfer call, but after about 10 calls, I'm
getting ~1800, ten calls more I see ~800, and going down to ~500.
However, with this bad commit, I don't see the empty stack going below
~500, so I don't get crashes. But going to a more recent commit, like
01d7136894410a71932096e0fb9f1d301b6ccf07, the situation is much worse.
The first print shows:
STACK FREE omap_i2c_xfer_irq: 1164, 1164, 1164
and it quickly goes to stack overflow.
Well... I guess there's no bug as such. The simple-pm-bus just uses a
lot more stack. Looking at the reports from CONFIG_STACK_TRACER, in the
good commit the biggest stack is in USB probing with depth of 4896. With
01d7136894410a71932096e0fb9f1d301b6ccf07, and a debug print in
omap_i2c_xfer_common, I can get to 7544.
I don't know what to do about this, so I'm giving up for the time being
and "fix" it for myself by increasing THREAD_SIZE_ORDER.
# cat /sys/kernel/debug/tracing/stack_trace
Depth Size Location (143 entries)
----- ---- --------
0) 7544 28 ftrace_graph_call+0x0/0xc
1) 7516 28 rcu_read_lock_sched_held+0x38/0x88
2) 7488 120 lock_acquire+0x1ac/0x560
3) 7368 16 _raw_spin_lock_irqsave+0x64/0x80
4) 7352 56 timekeeping_advance+0x3c/0x934
5) 7296 144 update_wall_time+0x1c/0x20
6) 7152 16 tick_do_update_jiffies64+0x1f0/0x304
7) 7136 104 tick_sched_timer+0xf0/0x10c
8) 7032 96 __hrtimer_run_queues+0x2d8/0x68c
9) 6936 112 hrtimer_interrupt+0x158/0x388
10) 6824 16 dmtimer_clockevent_interrupt+0x34/0x3c
11) 6808 64 __handle_irq_event_percpu+0x108/0x384
12) 6744 40 handle_irq_event+0x68/0xe0
13) 6704 32 handle_fasteoi_irq+0xbc/0x224
14) 6672 40 __handle_domain_irq+0x98/0x10c
15) 6632 48 gic_handle_irq+0x9c/0xc4
16) 6584 68 __irq_svc+0x70/0x98
17) 6516 60 console_unlock+0x444/0x688
18) 6456 196 vprintk_emit+0x138/0x344
19) 6260 68 vprintk_default+0x34/0x3c
20) 6192 48 vprintk_func+0xec/0x194
21) 6144 24 printk+0x3c/0x64
22) 6120 104 omap_i2c_xfer_common+0x698/0x6e4
23) 6016 16 omap_i2c_xfer_irq+0x1c/0x20
24) 6000 80 __i2c_transfer+0x184/0xa80
25) 5920 40 i2c_transfer+0xa8/0x144
26) 5880 64 regmap_i2c_read+0x6c/0xa8
27) 5816 72 _regmap_raw_read+0x138/0x474
28) 5744 40 _regmap_bus_read+0x54/0x80
29) 5704 56 _regmap_read+0x74/0x248
30) 5648 32 regmap_read+0x4c/0x70
31) 5616 32 regulator_get_voltage_sel_regmap+0x40/0x94
32) 5584 32 regulator_attr_is_visible+0x18c/0x278
33) 5552 88 internal_create_group+0xe4/0x3e0
34) 5464 40 internal_create_groups.part.0+0x54/0x98
35) 5424 16 sysfs_create_groups+0x24/0x30
36) 5408 96 device_add+0x280/0x944
37) 5312 104 regulator_register+0x694/0xc4c
38) 5208 32 devm_regulator_register+0x48/0x84
39) 5176 72 palmas_smps_registration+0x1b4/0x480
40) 5104 136 palmas_regulators_probe+0x348/0x450
41) 4968 32 platform_probe+0x68/0xc8
42) 4936 64 really_probe+0x114/0x588
43) 4872 48 driver_probe_device+0x88/0x1f0
44) 4824 32 __device_attach_driver+0xac/0x118
45) 4792 48 bus_for_each_drv+0x90/0xe0
46) 4744 48 __device_attach+0xe0/0x1ec
47) 4696 16 device_initial_probe+0x1c/0x20
48) 4680 32 bus_probe_device+0x98/0xa0
49) 4648 96 device_add+0x37c/0x944
50) 4552 16 of_device_add+0x44/0x4c
51) 4536 40 of_platform_device_create_pdata+0xa0/0xcc
52) 4496 112 of_platform_bus_create+0x1bc/0x34c
53) 4384 48 of_platform_populate+0x90/0x124
54) 4336 32 devm_of_platform_populate+0x4c/0x90
55) 4304 88 palmas_i2c_probe+0x478/0x608
56) 4216 40 i2c_device_probe+0x2a0/0x2f8
57) 4176 64 really_probe+0x114/0x588
58) 4112 48 driver_probe_device+0x88/0x1f0
59) 4064 32 __device_attach_driver+0xac/0x118
60) 4032 48 bus_for_each_drv+0x90/0xe0
61) 3984 48 __device_attach+0xe0/0x1ec
62) 3936 16 device_initial_probe+0x1c/0x20
63) 3920 32 bus_probe_device+0x98/0xa0
64) 3888 96 device_add+0x37c/0x944
65) 3792 24 device_register+0x24/0x28
66) 3768 48 i2c_new_client_device+0x140/0x264
67) 3720 112 of_i2c_register_devices+0xdc/0x188
68) 3608 64 i2c_register_adapter+0x220/0x700
69) 3544 32 __i2c_add_numbered_adapter+0x60/0xac
70) 3512 32 i2c_add_adapter+0xa0/0xe4
71) 3480 16 i2c_add_numbered_adapter+0x2c/0x30
72) 3464 80 omap_i2c_probe+0x404/0x6b8
73) 3384 32 platform_probe+0x68/0xc8
74) 3352 64 really_probe+0x114/0x588
75) 3288 48 driver_probe_device+0x88/0x1f0
76) 3240 32 __device_attach_driver+0xac/0x118
77) 3208 48 bus_for_each_drv+0x90/0xe0
78) 3160 48 __device_attach+0xe0/0x1ec
79) 3112 16 device_initial_probe+0x1c/0x20
80) 3096 32 bus_probe_device+0x98/0xa0
81) 3064 96 device_add+0x37c/0x944
82) 2968 16 of_device_add+0x44/0x4c
83) 2952 40 of_platform_device_create_pdata+0xa0/0xcc
84) 2912 112 of_platform_bus_create+0x1bc/0x34c
85) 2800 48 of_platform_populate+0x90/0x124
86) 2752 104 sysc_probe+0x10e4/0x1628
87) 2648 32 platform_probe+0x68/0xc8
88) 2616 64 really_probe+0x114/0x588
89) 2552 48 driver_probe_device+0x88/0x1f0
90) 2504 32 __device_attach_driver+0xac/0x118
91) 2472 48 bus_for_each_drv+0x90/0xe0
92) 2424 48 __device_attach+0xe0/0x1ec
93) 2376 16 device_initial_probe+0x1c/0x20
94) 2360 32 bus_probe_device+0x98/0xa0
95) 2328 96 device_add+0x37c/0x944
96) 2232 16 of_device_add+0x44/0x4c
97) 2216 40 of_platform_device_create_pdata+0xa0/0xcc
98) 2176 112 of_platform_bus_create+0x1bc/0x34c
99) 2064 48 of_platform_populate+0x90/0x124
100) 2016 32 simple_pm_bus_probe+0x54/0x80
101) 1984 32 platform_probe+0x68/0xc8
102) 1952 64 really_probe+0x114/0x588
103) 1888 48 driver_probe_device+0x88/0x1f0
104) 1840 32 __device_attach_driver+0xac/0x118
105) 1808 48 bus_for_each_drv+0x90/0xe0
106) 1760 48 __device_attach+0xe0/0x1ec
107) 1712 16 device_initial_probe+0x1c/0x20
108) 1696 32 bus_probe_device+0x98/0xa0
109) 1664 96 device_add+0x37c/0x944
110) 1568 16 of_device_add+0x44/0x4c
111) 1552 40 of_platform_device_create_pdata+0xa0/0xcc
112) 1512 112 of_platform_bus_create+0x1bc/0x34c
113) 1400 48 of_platform_populate+0x90/0x124
114) 1352 32 simple_pm_bus_probe+0x54/0x80
115) 1320 32 platform_probe+0x68/0xc8
116) 1288 64 really_probe+0x114/0x588
117) 1224 48 driver_probe_device+0x88/0x1f0
118) 1176 32 __device_attach_driver+0xac/0x118
119) 1144 48 bus_for_each_drv+0x90/0xe0
120) 1096 48 __device_attach+0xe0/0x1ec
121) 1048 16 device_initial_probe+0x1c/0x20
122) 1032 32 bus_probe_device+0x98/0xa0
123) 1000 96 device_add+0x37c/0x944
124) 904 16 of_device_add+0x44/0x4c
125) 888 40 of_platform_device_create_pdata+0xa0/0xcc
126) 848 112 of_platform_bus_create+0x1bc/0x34c
127) 736 48 of_platform_populate+0x90/0x124
128) 688 32 simple_pm_bus_probe+0x54/0x80
129) 656 32 platform_probe+0x68/0xc8
130) 624 64 really_probe+0x114/0x588
131) 560 48 driver_probe_device+0x88/0x1f0
132) 512 32 __device_attach_driver+0xac/0x118
133) 480 48 bus_for_each_drv+0x90/0xe0
134) 432 48 __device_attach+0xe0/0x1ec
135) 384 16 device_initial_probe+0x1c/0x20
136) 368 32 bus_probe_device+0x98/0xa0
137) 336 48 deferred_probe_work_func+0xac/0xfc
138) 288 104 process_one_work+0x270/0x7c4
139) 184 48 worker_thread+0x1ec/0x524
140) 136 48 kthread+0x160/0x178
141) 88 8 ret_from_fork+0x14/0x38
142) 80 80 0x0
Tomi