On Thu, 27 Oct 2022 12:38:16 -0700 Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > On 10/27/22 12:27, Steven Rostedt wrote: > > On Thu, 27 Oct 2022 15:20:58 -0400 > > Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > > >>> (many more of those) > >>> ... > >>> [ 16.329989] timer_fixup_free+0x40/0x54 > >> > >> Ah, I see the issue here. Looks like the timer_fixup_free() is calling > >> itself and crashing. > >> > >> Let me take a look into that. I didn't touch the fixup code, and there > >> could be an assumption there that it's behaving with the old approach. > > > > Can you add this and see if it makes this issue go away? > > > > Yes, that fixes the crash. However, it still reports > > [ 12.235054] ------------[ cut here ]------------ > [ 12.235240] ODEBUG: free active (active state 0) object type: timer_list hint: tcp_write_timer+0x0/0x190 > [ 12.237331] WARNING: CPU: 0 PID: 310 at lib/debugobjects.c:502 debug_print_object+0xb8/0x100 > ... > [ 12.255251] Call trace: > [ 12.255305] debug_print_object+0xb8/0x100 > [ 12.255385] __debug_check_no_obj_freed+0x1d0/0x25c > [ 12.255474] debug_check_no_obj_freed+0x20/0x90 > [ 12.255555] slab_free_freelist_hook.constprop.0+0xac/0x1b0 > [ 12.255650] kmem_cache_free+0x1ac/0x500 > [ 12.255728] __sk_destruct+0x140/0x2a0 > [ 12.255805] sk_destruct+0x54/0x64 > [ 12.255877] __sk_free+0x74/0x120 > [ 12.255944] sk_free+0x64/0x8c > [ 12.256009] tcp_close+0x94/0xc0 > [ 12.256076] inet_release+0x50/0xb0 > [ 12.256145] __sock_release+0x44/0xbc > [ 12.256219] sock_close+0x18/0x30 > [ 12.256292] __fput+0x84/0x270 > [ 12.256361] ____fput+0x10/0x20 > [ 12.256426] task_work_run+0x88/0xf0 > [ 12.256499] do_exit+0x334/0xafc > [ 12.256566] do_group_exit+0x34/0x90 > [ 12.256634] __arm64_sys_exit_group+0x18/0x20 > [ 12.256713] invoke_syscall+0x48/0x114 > [ 12.256789] el0_svc_common.constprop.0+0x60/0x11c > [ 12.256874] do_el0_svc+0x30/0xd0 > [ 12.256943] el0_svc+0x48/0xc0 > [ 12.257008] el0t_64_sync_handler+0xbc/0x13c > [ 12.257086] el0t_64_sync+0x18c/0x190 > > Is that a real problem or a false positive ? I didn't see that > without your patch series (which of course might be the whole point > of the series). > I think this is indeed an issue, and I'm replying to the net patch as it has the necessary folks Cc'd. The ipv4 tcp code has: void tcp_init_xmit_timers(struct sock *sk) { inet_csk_init_xmit_timers(sk, &tcp_write_timer, &tcp_delack_timer, &tcp_keepalive_timer); And from the above back trace: tcp_close() where I'm assuming that tcp_disconnect() or tcp_done() was called that both calls: tcp_clear_xmit_timers(sk); That calls: inet_csk_clear_xmit_timers(sk); That has: void inet_csk_clear_xmit_timers(struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); icsk->icsk_pending = icsk->icsk_ack.pending = 0; sk_stop_timer(sk, &icsk->icsk_retransmit_timer); sk_stop_timer(sk, &icsk->icsk_delack_timer); sk_stop_timer(sk, &sk->sk_timer); } Where: void sk_stop_timer(struct sock *sk, struct timer_list* timer) { if (del_timer(timer)) __sock_put(sk); } Hence, this is a case where we have timers that have been disabled with only del_timer() before the timers are freed. I think we need to update this code to squeeze in a del_timer_shutdown() to make sure that the timers are never restarted. There is a sk_stop_timer_sync() that I changed to use del_timer_shutdown() but that's only used in one file: net/mptcp/pm_netlink.c -- Steve