On Wed, Sep 22, 2021 at 04:27:02PM -0700, Paul E. McKenney wrote: > On Thu, Sep 23, 2021 at 12:56:39AM +0200, Thomas Gleixner wrote: > > On Mon, Sep 20 2021 at 20:15, syzbot wrote: > > > > Cc+: paulmck > > > > > Hello, > > > > > > syzbot found the following issue on: > > > > > > HEAD commit: 85c698863c15 net/ipv4/tcp_minisocks.c: remove superfluous .. > > > git tree: net-next > > > console output: https://syzkaller.appspot.com/x/log.txt?x=13d9d3e7300000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=6d93fe4341f98704 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=a10a3d280be23be45d04 > > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11bd98f7300000 > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > Reported-by: syzbot+a10a3d280be23be45d04@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > > > ------------[ cut here ]------------ > > > ODEBUG: assert_init not available (active state 0) object type: timer_list hint: 0x0 > > > WARNING: CPU: 0 PID: 3816 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505 > > > Modules linked in: > > > CPU: 0 PID: 3816 Comm: syz-executor.0 Not tainted 5.15.0-rc1-syzkaller #0 > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > > RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505 > > > Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd c0 39 e4 89 4c 89 ee 48 c7 c7 c0 2d e4 89 e8 ef e3 14 05 <0f> 0b 83 05 35 09 91 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3 > > > RSP: 0018:ffffc9000a6770f0 EFLAGS: 00010282 > > > RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000000000 > > > RDX: ffff88805b3db900 RSI: ffffffff815dbd88 RDI: fffff520014cee10 > > > RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 > > > R10: ffffffff815d5b2e R11: 0000000000000000 R12: ffffffff898de180 > > > R13: ffffffff89e43440 R14: ffffffff8164bae0 R15: 1ffff920014cee29 > > > FS: 00007fee66b3a700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 00007f362471f3a0 CR3: 00000000272af000 CR4: 00000000001506f0 > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > Call Trace: > > > debug_object_assert_init lib/debugobjects.c:895 [inline] > > > debug_object_assert_init+0x1f4/0x2e0 lib/debugobjects.c:866 > > > debug_timer_assert_init kernel/time/timer.c:739 [inline] > > > debug_assert_init kernel/time/timer.c:784 [inline] > > > try_to_del_timer_sync+0x6d/0x110 kernel/time/timer.c:1229 > > > del_timer_sync+0x138/0x1b0 kernel/time/timer.c:1382 > > > cleanup_srcu_struct+0x14c/0x2f0 kernel/rcu/srcutree.c:379 > > > > So cleanup_srcu_struct() tries to delete a timer which was never initialized.... > > That does not sound like a way to keep a kernel running... > > For dynamically allocated srcu_struct structures, what is supposed to > happen is that init_srcu_struct() is invoked at some point before the > first use. Then init_srcu_struct() invokes init_srcu_struct_fields() > which invokes init_srcu_struct_nodes() which initializes that timer. > Unless the memory allocations in init_srcu_struct_fields() failed, > in which case init_srcu_struct() should have handed you back a -ENOMEM. > > OK, there is a call to init_srcu_struct() in blk_mq_alloc_hctx(). > It does ignore the init_srcu_struct() return value, so maybe a > WARN_ON_ONCE(init_srcu_struct(hctx->srcu)) would be good. Yeah. That should be the issue because del_timer_sync() from flush_delayed_work() in cleanup_srcu_struct() isn't complained from the stack trace log, and just the timer in perpcu field of ->sda isn't initialized, and ->sda is allocated via alloc_percpu(). > > The ->srcu field is declared as follows: > > struct srcu_struct srcu[]; > > The blk_mq_hw_ctx_size() function adjusts the size. All of this is > controlled by a BLK_MQ_F_BLOCKING flag. If this flag were to change, > clearly bad things could happen. The places where it is set look to me > to be initialization time, but it might be good to have someone familiar > with this code double-check this. Or have a separate bit that records > the state of BLK_MQ_F_BLOCKING at blk_mq_alloc_hctx() time and complain > bitterly if there was a change at blk_mq_hw_sysfs_release() time. BLK_MQ_F_BLOCKING is set from the beginning and never changed. Thanks, Ming