It already works. Thank you very much. -----邮件原件----- 发件人: Xin Long [mailto:lucien.xin@xxxxxxxxx] 发送时间: 2022年11月4日 3:37 收件人: Caowangbao <caowangbao@xxxxxxxxxx> 抄送: Chenzhen(EulerOS) <chenzhen126@xxxxxxxxxx>; vyasevich@xxxxxxxxx; nhorman@xxxxxxxxxxxxx; marcelo.leitner@xxxxxxxxx; linux-sctp@xxxxxxxxxxxxxxx; davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx; kuba@xxxxxxxxxx; pabeni@xxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; Yanan (Euler) <yanan@xxxxxxxxxx> 主题: Re: BUG: kernel NULL pointer dereference in sctp_sched_dequeue_common On Thu, Nov 3, 2022 at 5:53 AM Caowangbao <caowangbao@xxxxxxxxxx> wrote: > > I have reduced the recurrence conditions and can reproduce the problem by running the following statement: > > 18:00:56 executing program 0: > r0 = socket$inet6_sctp(0xa, 0x1, 0x84) > setsockopt$inet_sctp6_SCTP_SOCKOPT_BINDX_ADD(r0, 0x84, 0x64, > &(0x7f00000001c0)=[@in={0x2, 0x4e20, @empty}], 0x10) (async) > getsockopt$inet_sctp6_SCTP_SOCKOPT_CONNECTX3(r0, 0x84, 0x6f, > &(0x7f0000000580)={<r1=>0x0, 0x10, &(0x7f0000000540)=[@in={0x2, > 0x4e20, @local}]}, &(0x7f0000000600)=0x10) (async) > r2 = dup2(r0, r0) > setsockopt$inet_sctp6_SCTP_DEFAULT_PRINFO(r2, 0x84, 0x72, > &(0x7f0000000000)={0x0, 0x6, 0x30}, 0xc) (async) Thanks for the statements. The crash seems related to SCTP_PR_SCTP_PRIO. When there is not enough buffer for the out msg, it will prune the out_chunk_list according to the priority set by SCTP_DEFAULT_PRINFO in sctp_prsctp_prune_unsent(). However, it doesn't clear asoc->stream.out_curr if all frag chunks of current msg are pruned. Can you apply this patch to your kernel and give it try? diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index e213aaf45d67..41b8065cfe65 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -384,6 +384,7 @@ static int sctp_prsctp_prune_unsent(struct sctp_association *asoc, { struct sctp_outq *q = &asoc->outqueue; struct sctp_chunk *chk, *temp; + struct sctp_stream_out *sout; q->sched->unsched_all(&asoc->stream); @@ -398,12 +399,12 @@ static int sctp_prsctp_prune_unsent(struct sctp_association *asoc, sctp_sched_dequeue_common(q, chk); asoc->sent_cnt_removable--; asoc->abandoned_unsent[SCTP_PR_INDEX(PRIO)]++; - if (chk->sinfo.sinfo_stream < asoc->stream.outcnt) { - struct sctp_stream_out *streamout = - SCTP_SO(&asoc->stream, chk->sinfo.sinfo_stream); - streamout->ext->abandoned_unsent[SCTP_PR_INDEX(PRIO)]++; - } + sout = SCTP_SO(&asoc->stream, chk->sinfo.sinfo_stream); + sout->ext->abandoned_unsent[SCTP_PR_INDEX(PRIO)]++; + if (asoc->stream.out_curr == sout && + list_is_last(&chk->frag_list, &chk->msg->chunks)) + asoc->stream.out_curr = NULL; /* clear out_curr if all frag chunks are pruned */ msg_len -= chk->skb->truesize + sizeof(struct sctp_chunk); sctp_chunk_free(chk); Thanks. > sendmmsg$sock(r2, ...) (async) > r3 = socket$inet6_sctp(0xa, 0x1, 0x84) > r4 = dup2(r2, r3) > setsockopt$inet_sctp6_SCTP_DEFAULT_PRINFO(r4, 0x84, 0x72, > &(0x7f0000000040), 0xc) sendmsg$alg(r4, &(0x7f0000002700)={0x0, 0x0, > &(0x7f0000002500)=[{&(0x7f0000004100)='@', 0x1}], 0x1}, 0x0) (async) > setsockopt$inet_sctp6_SCTP_DEFAULT_SNDINFO(r2, 0x84, 0x22, > &(0x7f0000002480)={0x5, 0x202, 0x9, 0x10001, r1}, 0x10) (async) > write$tun(r4, > &(0x7f0000002640)=ANY=[@ANYBLOB="000000000000000000000000000006000000a > aaaaaaaaabb467219e67b1d64441845437b38c12ddeb986e59e82bd4247f1ed8a05309 > a31b9494bda521ffd4b68bf072b030d7ef04cc219c73572fac79f47369d49ae19df016 > 41921e3af34cb84766ede45e4fa9a14460fae51557f643d108ba54f7cb8440ce5aa0e6 > 0d7c2c4da"], 0x1e) (async) write(r0, &(0x7f0000000080)="e4", 0x1) > > > > -----邮件原件----- > 发件人: Caowangbao > 发送时间: 2022年11月3日 10:13 > 收件人: 'Xin Long' <lucien.xin@xxxxxxxxx>; Chenzhen(EulerOS) > <chenzhen126@xxxxxxxxxx> > 抄送: 'vyasevich@xxxxxxxxx' <vyasevich@xxxxxxxxx>; > 'nhorman@xxxxxxxxxxxxx' <nhorman@xxxxxxxxxxxxx>; > 'marcelo.leitner@xxxxxxxxx' <marcelo.leitner@xxxxxxxxx>; > 'linux-sctp@xxxxxxxxxxxxxxx' <linux-sctp@xxxxxxxxxxxxxxx>; > 'davem@xxxxxxxxxxxxx' <davem@xxxxxxxxxxxxx>; 'edumazet@xxxxxxxxxx' > <edumazet@xxxxxxxxxx>; 'kuba@xxxxxxxxxx' <kuba@xxxxxxxxxx>; > 'pabeni@xxxxxxxxxx' <pabeni@xxxxxxxxxx>; 'netdev@xxxxxxxxxxxxxxx' > <netdev@xxxxxxxxxxxxxxx>; Yanan (Euler) <yanan@xxxxxxxxxx> > 主题: 答复: BUG: kernel NULL pointer dereference in > sctp_sched_dequeue_common > > It can be reproduce by the command " ./syz-execprog -procs=16 -repeat=0 sctp_sched_dequeue_common" with the attachments. > > void sctp_sched_dequeue_common(struct sctp_outq *q, struct sctp_chunk *ch) { > list_del_init(&ch->list); > list_del_init(&ch->stream_list); > q->out_qlen -= ch->skb->len; // ch->skb is null in the VMCore > } > > The kernel log records: > [23411.786575] list_del corruption, ffffa035ddf01c18->next is NULL > [23411.787780] WARNING: CPU: 1 PID: 250682 at lib/list_debug.c:49 __list_del_entry_valid+0x59/0xe0 > ****** > [23411.830256] Call Trace: > [23411.830863] sctp_sched_dequeue_common+0x17/0x70 [sctp] > [23411.831940] sctp_sched_fcfs_dequeue+0x37/0x50 [sctp] > [23411.832967] sctp_outq_flush_data+0x85/0x360 [sctp] It means "ch->list" has no element. > > And in VMCore , there are many calls like: > #2 [ffffaf7d84f6bbb8] __lock_sock at ffffffff8ac74ef9 > #3 [ffffaf7d84f6bc08] lock_sock_nested at ffffffff8ac74f92 > #4 [ffffaf7d84f6bc20] sctp_wait_for_sndbuf at ffffffffc0c8d9d2 [sctp] > #5 [ffffaf7d84f6bc98] sctp_sendmsg_to_asoc at ffffffffc0c8dd1e [sctp] > #6 [ffffaf7d84f6bd08] sctp_sendmsg at ffffffffc0c95f55 [sctp] > #7 [ffffaf7d84f6bdb8] sock_sendmsg at ffffffff8ac6fd0b > #8 [ffffaf7d84f6bdd0] sock_write_iter at ffffffff8ac6fdb7 > #9 [ffffaf7d84f6be48] new_sync_write at ffffffff8a784021 > #10 [ffffaf7d84f6bed0] vfs_write at ffffffff8a784d07 > #11 [ffffaf7d84f6bf08] ksys_write at ffffffff8a78719b > #12 [ffffaf7d84f6bf40] do_syscall_64 at ffffffff8ae9a8b3 It may have something to do with these concurrent invocations. > > -----邮件原件----- > 发件人: Xin Long [mailto:lucien.xin@xxxxxxxxx] > 发送时间: 2022年11月3日 9:20 > 收件人: Chenzhen(EulerOS) <chenzhen126@xxxxxxxxxx> > 抄送: vyasevich@xxxxxxxxx; nhorman@xxxxxxxxxxxxx; > marcelo.leitner@xxxxxxxxx; linux-sctp@xxxxxxxxxxxxxxx; > davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx; kuba@xxxxxxxxxx; > pabeni@xxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; Caowangbao > <caowangbao@xxxxxxxxxx>; Yanan (Euler) <yanan@xxxxxxxxxx> > 主题: Re: BUG: kernel NULL pointer dereference in > sctp_sched_dequeue_common > > On Wed, Nov 2, 2022 at 10:29 AM Zhen Chen <chenzhen126@xxxxxxxxxx> wrote: > > > > Hi,all > > > > We found the following crash when running fuzz tests on stable-5.10. > > > > ------------[ cut here ]------------ list_del corruption, > > ffffa035ddf01c18->next is NULL > > WARNING: CPU: 1 PID: 250682 at lib/list_debug.c:49 __list_del_entry_valid+0x59/0xe0 > > CPU: 1 PID: 250682 Comm: syz-executor.7 Kdump: loaded Tainted: G O > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > rel-1.10.2-0-g5f4c7b1-20181220_000000-szxrtosci10000 04/01/2014 > > RIP: 0010:__list_del_entry_valid+0x59/0xe0 > > Code: c0 74 5a 4d 8b 00 49 39 f0 75 6a 48 8b 52 08 4c 39 c2 75 79 b8 > > 01 00 00 00 c3 cc cc cc cc 48 c7 c7 68 ae 78 8b e8 d2 3d 4e 00 <0f> > > 0b > > 31 c0 c3 cc cc cc cc 48 c7 c7 90 ae 78 8b e8 bd 3d 4e 00 0f > > RSP: 0018:ffffaf7d84a57930 EFLAGS: 00010286 > > RAX: 0000000000000000 RBX: ffffa035ddf01c18 RCX: 0000000000000000 > > RDX: ffffa035facb0820 RSI: ffffa035faca0410 RDI: ffffa035faca0410 > > RBP: ffffa035dddff6f8 R08: 0000000000000000 R09: ffffaf7d84a57770 > > R10: ffffaf7d84a57768 R11: ffffffff8bddc248 R12: ffffa035ddf01c18 > > R13: ffffaf7d84a57af8 R14: ffffaf7d84a57c28 R15: 0000000000000000 > > FS: 00007fb7353ae700(0000) GS:ffffa035fac80000(0000) > > knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00007f509a3d0ee8 CR3: 000000010f7c2001 CR4: 00000000001706e0 > > Call > > Trace: > > sctp_sched_dequeue_common+0x17/0x70 [sctp] > > sctp_sched_fcfs_dequeue+0x37/0x50 [sctp] > > sctp_outq_flush_data+0x85/0x360 [sctp] > > sctp_outq_uncork+0x77/0xa0 [sctp] > > sctp_cmd_interpreter.constprop.0+0x164/0x1450 [sctp] > > sctp_side_effects+0x37/0xe0 [sctp] > > sctp_do_sm+0xd0/0x230 [sctp] > > sctp_primitive_SEND+0x2f/0x40 [sctp] > > sctp_sendmsg_to_asoc+0x3fa/0x5c0 [sctp] > > sctp_sendmsg+0x3d5/0x440 [sctp] > > sock_sendmsg+0x5b/0x70 > > sock_write_iter+0x97/0x100 > > new_sync_write+0x1a1/0x1b0 > > vfs_write+0x1b7/0x250 > > ksys_write+0xab/0xe0 > > do_syscall_64+0x33/0x40 > > entry_SYSCALL_64_after_hwframe+0x61/0xc6 > > RIP: 0033:0x461e3d > > Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 > > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> > > 3d > > 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 > > RSP: 002b:00007fb7353adc08 EFLAGS: 00000246 ORIG_RAX: > > 0000000000000001 > > RAX: ffffffffffffffda RBX: 000000000058c1d0 RCX: 0000000000461e3d > > RDX: 000000000000001e RSI: 0000000020002640 RDI: 0000000000000004 > > RBP: 000000000058c1d0 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > > R13: 00007fb7353ae700 R14: 00007ffc4c20ce00 R15: 0000000000000fff > > ---[ end trace 332cf75246d5ba68 ]--- > > BUG: kernel NULL pointer dereference, address: 0000000000000070 > > #PF: supervisor read access in kernel mode > > #PF: error_code(0x0000) - not-present page PGD 800000010c0d4067 P4D > > 800000010c0d4067 PUD 10f275067 PMD 0 > > Oops: 0000 [#1] SMP PTI > > CPU: 1 PID: 250682 Comm: syz-executor.7 Kdump: loaded Tainted: G W O > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > rel-1.10.2-0-g5f4c7b1-20181220_000000-szxrtosci10000 04/01/2014 > > RIP: 0010:sctp_sched_dequeue_common+0x5c/0x70 [sctp] > > Code: 5b 08 4c 89 e7 e8 44 c5 cc c9 84 c0 74 0f 48 8b 53 18 48 8b 43 > > 20 48 89 42 08 48 89 10 48 8b 43 38 4c 89 63 18 4c 89 63 20 5b <8b> > > 40 > > 70 29 45 20 5d 41 5c c3 cc cc cc cc 66 0f 1f 44 00 00 0f 1f > > RSP: 0018:ffffaf7d84a57940 EFLAGS: 00010202 > > RAX: 0000000000000000 RBX: ffffaf7d84a579a0 RCX: 0000000000000000 > > RDX: ffffa035ddf01c30 RSI: ffffa035ddf01c30 RDI: ffffa035ddf01c30 > > RBP: ffffa035dddff6f8 R08: ffffa035ddf01c30 R09: ffffaf7d84a57770 > > R10: ffffaf7d84a57768 R11: ffffffff8bddc248 R12: ffffa035ddf01c30 > > R13: ffffaf7d84a57af8 R14: ffffaf7d84a57c28 R15: 0000000000000000 > > FS: 00007fb7353ae700(0000) GS:ffffa035fac80000(0000) > > knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000000000000070 CR3: 000000010f7c2001 CR4: 00000000001706e0 > > Call > > Trace: > > sctp_sched_fcfs_dequeue+0x37/0x50 [sctp] > > sctp_outq_flush_data+0x85/0x360 [sctp] > > sctp_outq_uncork+0x77/0xa0 [sctp] > > sctp_cmd_interpreter.constprop.0+0x164/0x1450 [sctp] > > sctp_side_effects+0x37/0xe0 [sctp] > > sctp_do_sm+0xd0/0x230 [sctp] > > sctp_primitive_SEND+0x2f/0x40 [sctp] > > sctp_sendmsg_to_asoc+0x3fa/0x5c0 [sctp] > > sctp_sendmsg+0x3d5/0x440 [sctp] > > sock_sendmsg+0x5b/0x70 > > sock_write_iter+0x97/0x100 > > new_sync_write+0x1a1/0x1b0 > > vfs_write+0x1b7/0x250 > > ksys_write+0xab/0xe0 > > do_syscall_64+0x33/0x40 > > entry_SYSCALL_64_after_hwframe+0x61/0xc6 > > RIP: 0033:0x461e3d > > Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 > > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> > > 3d > > 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 > > RSP: 002b:00007fb7353adc08 EFLAGS: 00000246 ORIG_RAX: > > 0000000000000001 > > RAX: ffffffffffffffda RBX: 000000000058c1d0 RCX: 0000000000461e3d > > RDX: 000000000000001e RSI: 0000000020002640 RDI: 0000000000000004 > > RBP: 000000000058c1d0 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > > R13: 00007fb7353ae700 R14: 00007ffc4c20ce00 R15: 0000000000000fff > > > > > > It is quite similar to the issue (See > > https://lore.kernel.org/all/CAO4mrfcB0d+qbwtfndzqcrL+QEQgfOmJYQMAdzw > > xR ePmP8TY1A@xxxxxxxxxxxxxx/ ) , which was addressed by 181d8d2066c0 > > (sctp: leave the err path free in sctp_stream_init to > > sctp_stream_free), but unfortunately the patch do not work with this > > bug :( > > > So this issue is reproducible in your env? > Can you show what it does in your test or the reproducer if there is one? > > Thanks.