Re: Random stack corruption on v5.13 with dra76

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21/05/2021 13:30, Tomi Valkeinen wrote:
On 21/05/2021 12:14, Tony Lindgren wrote:
* Tomi Valkeinen <tomi.valkeinen@xxxxxxxxxxxxxxxx> [210521 08:45]:
On 21/05/2021 10:39, Tony Lindgren wrote:
* Tomi Valkeinen <tomi.valkeinen@xxxxxxxxxxxxxxxx> [210521 07:05]:
On 21/05/2021 08:36, Tony Lindgren wrote:
* Tomi Valkeinen <tomi.valkeinen@xxxxxxxxxxxxxxxx> [210520 08:27]:
Hi,

I've noticed that the v5.13 rcs crash randomly (but quite often) on dra76 evm
(I haven't tested other boards). Anyone else seen this problem?

I have not seen this so far and beagle-x15 is behaving for me.

Does it always happen on boot?

No, but quite often. I can't really say how often, as it's annoyingly random. I tried to bisect, but that proved to be difficult as sometimes I get multiple (5+)
successful boots before the crash.

I tested with x15, same issue (below). So... Something in my kernel config? Or compiler? Looks like the crash happens always very soon after (or during) probing palmas.

After about 10 reboots with your .config I'm seeing it now too on
beagle-x15. So far no luck reproducing it with omap2plus_defconfig.

I think I have an easy way to see if a kernel is good or bad, by printing stack_not_used(current) in the first call to omap_i2c_xfer_irq(). There's a
huge drop between v5.12 and v5.13-rc1.

And interestingly, sometimes a simple printk seems to use hundreds of bytes
of stack (i.e. compare stack usage before and after the print). But not
always. So maybe the issue is somehow related to printk.

I'm bisecting.

OK sounds good to me.

Well, I found the bad commit but unfortunately it doesn't exactly point where the issue is.

f483a3e123410bd1c78af295bf65feffb6769a98 is the first bad commit
commit f483a3e123410bd1c78af295bf65feffb6769a98
Author: Tony Lindgren <tony@xxxxxxxxxxx>
Date:   Wed Mar 10 14:03:48 2021 +0200

     ARM: dts: Configure simple-pm-bus for dra7 l4_per1

    We can now probe interconnects with device tree only configuration using
     simple-pm-bus and genpd.

     Tested-by: Kishon Vijay Abraham I <kishon@xxxxxx>
     Signed-off-by: Tony Lindgren <tony@xxxxxxxxxxx>

  arch/arm/boot/dts/dra7-l4.dtsi | 9 ++++++---
  1 file changed, 6 insertions(+), 3 deletions(-)


The difference is clear, though. With 9a75368b6426739e8b798592f084cb682d760568, which is the last good commit, when I print the stack usage with stack_not_used() in three different places in omap_i2c_xfer_irq(), I get always prints roughly like:

STACK FREE omap_i2c_xfer_irq: 2972, 2972, 2972

And these repeat exactly the same for each call to omap_i2c_xfer_irq (at least during palmas probe).

With the bad commit the situation is different. The first call to omap_i2c_xfer_irq prints:

STACK FREE omap_i2c_xfer_irq: 2024, 2024, 2024

so we're already using 1k more. But then, instead of the stack usage staying the same, consecutive calls show increased stack usage. It doesn't increase for each xfer call, but after about 10 calls, I'm getting ~1800, ten calls more I see ~800, and going down to ~500.

However, with this bad commit, I don't see the empty stack going below ~500, so I don't get crashes. But going to a more recent commit, like 01d7136894410a71932096e0fb9f1d301b6ccf07, the situation is much worse. The first print shows:

STACK FREE omap_i2c_xfer_irq: 1164, 1164, 1164

and it quickly goes to stack overflow.

Well... I guess there's no bug as such. The simple-pm-bus just uses a lot more stack. Looking at the reports from CONFIG_STACK_TRACER, in the good commit the biggest stack is in USB probing with depth of 4896. With 01d7136894410a71932096e0fb9f1d301b6ccf07, and a debug print in omap_i2c_xfer_common, I can get to 7544.

I don't know what to do about this, so I'm giving up for the time being and "fix" it for myself by increasing THREAD_SIZE_ORDER.

# cat /sys/kernel/debug/tracing/stack_trace
        Depth    Size   Location    (143 entries)
        -----    ----   --------
  0)     7544      28   ftrace_graph_call+0x0/0xc
  1)     7516      28   rcu_read_lock_sched_held+0x38/0x88
  2)     7488     120   lock_acquire+0x1ac/0x560
  3)     7368      16   _raw_spin_lock_irqsave+0x64/0x80
  4)     7352      56   timekeeping_advance+0x3c/0x934
  5)     7296     144   update_wall_time+0x1c/0x20
  6)     7152      16   tick_do_update_jiffies64+0x1f0/0x304
  7)     7136     104   tick_sched_timer+0xf0/0x10c
  8)     7032      96   __hrtimer_run_queues+0x2d8/0x68c
  9)     6936     112   hrtimer_interrupt+0x158/0x388
 10)     6824      16   dmtimer_clockevent_interrupt+0x34/0x3c
 11)     6808      64   __handle_irq_event_percpu+0x108/0x384
 12)     6744      40   handle_irq_event+0x68/0xe0
 13)     6704      32   handle_fasteoi_irq+0xbc/0x224
 14)     6672      40   __handle_domain_irq+0x98/0x10c
 15)     6632      48   gic_handle_irq+0x9c/0xc4
 16)     6584      68   __irq_svc+0x70/0x98
 17)     6516      60   console_unlock+0x444/0x688
 18)     6456     196   vprintk_emit+0x138/0x344
 19)     6260      68   vprintk_default+0x34/0x3c
 20)     6192      48   vprintk_func+0xec/0x194
 21)     6144      24   printk+0x3c/0x64
 22)     6120     104   omap_i2c_xfer_common+0x698/0x6e4
 23)     6016      16   omap_i2c_xfer_irq+0x1c/0x20
 24)     6000      80   __i2c_transfer+0x184/0xa80
 25)     5920      40   i2c_transfer+0xa8/0x144
 26)     5880      64   regmap_i2c_read+0x6c/0xa8
 27)     5816      72   _regmap_raw_read+0x138/0x474
 28)     5744      40   _regmap_bus_read+0x54/0x80
 29)     5704      56   _regmap_read+0x74/0x248
 30)     5648      32   regmap_read+0x4c/0x70
 31)     5616      32   regulator_get_voltage_sel_regmap+0x40/0x94
 32)     5584      32   regulator_attr_is_visible+0x18c/0x278
 33)     5552      88   internal_create_group+0xe4/0x3e0
 34)     5464      40   internal_create_groups.part.0+0x54/0x98
 35)     5424      16   sysfs_create_groups+0x24/0x30
 36)     5408      96   device_add+0x280/0x944
 37)     5312     104   regulator_register+0x694/0xc4c
 38)     5208      32   devm_regulator_register+0x48/0x84
 39)     5176      72   palmas_smps_registration+0x1b4/0x480
 40)     5104     136   palmas_regulators_probe+0x348/0x450
 41)     4968      32   platform_probe+0x68/0xc8
 42)     4936      64   really_probe+0x114/0x588
 43)     4872      48   driver_probe_device+0x88/0x1f0
 44)     4824      32   __device_attach_driver+0xac/0x118
 45)     4792      48   bus_for_each_drv+0x90/0xe0
 46)     4744      48   __device_attach+0xe0/0x1ec
 47)     4696      16   device_initial_probe+0x1c/0x20
 48)     4680      32   bus_probe_device+0x98/0xa0
 49)     4648      96   device_add+0x37c/0x944
 50)     4552      16   of_device_add+0x44/0x4c
 51)     4536      40   of_platform_device_create_pdata+0xa0/0xcc
 52)     4496     112   of_platform_bus_create+0x1bc/0x34c
 53)     4384      48   of_platform_populate+0x90/0x124
 54)     4336      32   devm_of_platform_populate+0x4c/0x90
 55)     4304      88   palmas_i2c_probe+0x478/0x608
 56)     4216      40   i2c_device_probe+0x2a0/0x2f8
 57)     4176      64   really_probe+0x114/0x588
 58)     4112      48   driver_probe_device+0x88/0x1f0
 59)     4064      32   __device_attach_driver+0xac/0x118
 60)     4032      48   bus_for_each_drv+0x90/0xe0
 61)     3984      48   __device_attach+0xe0/0x1ec
 62)     3936      16   device_initial_probe+0x1c/0x20
 63)     3920      32   bus_probe_device+0x98/0xa0
 64)     3888      96   device_add+0x37c/0x944
 65)     3792      24   device_register+0x24/0x28
 66)     3768      48   i2c_new_client_device+0x140/0x264
 67)     3720     112   of_i2c_register_devices+0xdc/0x188
 68)     3608      64   i2c_register_adapter+0x220/0x700
 69)     3544      32   __i2c_add_numbered_adapter+0x60/0xac
 70)     3512      32   i2c_add_adapter+0xa0/0xe4
 71)     3480      16   i2c_add_numbered_adapter+0x2c/0x30
 72)     3464      80   omap_i2c_probe+0x404/0x6b8
 73)     3384      32   platform_probe+0x68/0xc8
 74)     3352      64   really_probe+0x114/0x588
 75)     3288      48   driver_probe_device+0x88/0x1f0
 76)     3240      32   __device_attach_driver+0xac/0x118
 77)     3208      48   bus_for_each_drv+0x90/0xe0
 78)     3160      48   __device_attach+0xe0/0x1ec
 79)     3112      16   device_initial_probe+0x1c/0x20
 80)     3096      32   bus_probe_device+0x98/0xa0
 81)     3064      96   device_add+0x37c/0x944
 82)     2968      16   of_device_add+0x44/0x4c
 83)     2952      40   of_platform_device_create_pdata+0xa0/0xcc
 84)     2912     112   of_platform_bus_create+0x1bc/0x34c
 85)     2800      48   of_platform_populate+0x90/0x124
 86)     2752     104   sysc_probe+0x10e4/0x1628
 87)     2648      32   platform_probe+0x68/0xc8
 88)     2616      64   really_probe+0x114/0x588
 89)     2552      48   driver_probe_device+0x88/0x1f0
 90)     2504      32   __device_attach_driver+0xac/0x118
 91)     2472      48   bus_for_each_drv+0x90/0xe0
 92)     2424      48   __device_attach+0xe0/0x1ec
 93)     2376      16   device_initial_probe+0x1c/0x20
 94)     2360      32   bus_probe_device+0x98/0xa0
 95)     2328      96   device_add+0x37c/0x944
 96)     2232      16   of_device_add+0x44/0x4c
 97)     2216      40   of_platform_device_create_pdata+0xa0/0xcc
 98)     2176     112   of_platform_bus_create+0x1bc/0x34c
 99)     2064      48   of_platform_populate+0x90/0x124
100)     2016      32   simple_pm_bus_probe+0x54/0x80
101)     1984      32   platform_probe+0x68/0xc8
102)     1952      64   really_probe+0x114/0x588
103)     1888      48   driver_probe_device+0x88/0x1f0
104)     1840      32   __device_attach_driver+0xac/0x118
105)     1808      48   bus_for_each_drv+0x90/0xe0
106)     1760      48   __device_attach+0xe0/0x1ec
107)     1712      16   device_initial_probe+0x1c/0x20
108)     1696      32   bus_probe_device+0x98/0xa0
109)     1664      96   device_add+0x37c/0x944
110)     1568      16   of_device_add+0x44/0x4c
111)     1552      40   of_platform_device_create_pdata+0xa0/0xcc
112)     1512     112   of_platform_bus_create+0x1bc/0x34c
113)     1400      48   of_platform_populate+0x90/0x124
114)     1352      32   simple_pm_bus_probe+0x54/0x80
115)     1320      32   platform_probe+0x68/0xc8
116)     1288      64   really_probe+0x114/0x588
117)     1224      48   driver_probe_device+0x88/0x1f0
118)     1176      32   __device_attach_driver+0xac/0x118
119)     1144      48   bus_for_each_drv+0x90/0xe0
120)     1096      48   __device_attach+0xe0/0x1ec
121)     1048      16   device_initial_probe+0x1c/0x20
122)     1032      32   bus_probe_device+0x98/0xa0
123)     1000      96   device_add+0x37c/0x944
124)      904      16   of_device_add+0x44/0x4c
125)      888      40   of_platform_device_create_pdata+0xa0/0xcc
126)      848     112   of_platform_bus_create+0x1bc/0x34c
127)      736      48   of_platform_populate+0x90/0x124
128)      688      32   simple_pm_bus_probe+0x54/0x80
129)      656      32   platform_probe+0x68/0xc8
130)      624      64   really_probe+0x114/0x588
131)      560      48   driver_probe_device+0x88/0x1f0
132)      512      32   __device_attach_driver+0xac/0x118
133)      480      48   bus_for_each_drv+0x90/0xe0
134)      432      48   __device_attach+0xe0/0x1ec
135)      384      16   device_initial_probe+0x1c/0x20
136)      368      32   bus_probe_device+0x98/0xa0
137)      336      48   deferred_probe_work_func+0xac/0xfc
138)      288     104   process_one_work+0x270/0x7c4
139)      184      48   worker_thread+0x1ec/0x524
140)      136      48   kthread+0x160/0x178
141)       88       8   ret_from_fork+0x14/0x38
142)       80      80   0x0


 Tomi



[Index of Archives]     [Linux Arm (vger)]     [ARM Kernel]     [ARM MSM]     [Linux Tegra]     [Linux WPAN Networking]     [Linux Wireless Networking]     [Maemo Users]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux