On Thu, Dec 08, 2022 at 01:07 AM -08, John Fastabend wrote: > Eric Dumazet wrote: >> On Wed, Dec 7, 2022 at 7:38 AM John Fastabend <john.fastabend@xxxxxxxxx> wrote: >> > >> > syzbot wrote: >> > > Hello, >> > > >> > > syzbot found the following issue on: >> > > >> > > HEAD commit: 6a30d3e3491d selftests: net: Use "grep -E" instead of "egr.. >> > > git tree: net >> > > console+strace: https://syzkaller.appspot.com/x/log.txt?x=1576b11d880000 >> > > kernel config: https://syzkaller.appspot.com/x/.config?x=cc4b2e0a8e8a8366 >> > > dashboard link: https://syzkaller.appspot.com/bug?extid=04c21ed96d861dccc5cd >> > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for >> > > Debian) 2.35.2 >> > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=14e1656b880000 >> > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1077da23880000 >> > > >> > > Downloadable assets: >> > > disk image: >> > > https://storage.googleapis.com/syzbot-assets/bbee3d5fc908/disk-6a30d3e3.raw.xz >> > > vmlinux: https://storage.googleapis.com/syzbot-assets/bf9e258de70e/vmlinux-6a30d3e3.xz >> > > kernel image: >> > > https://storage.googleapis.com/syzbot-assets/afaa6696b9e0/bzImage-6a30d3e3.xz >> > > >> > > IMPORTANT: if you fix the issue, please add the following tag to the commit: >> > > Reported-by: syzbot+04c21ed96d861dccc5cd@xxxxxxxxxxxxxxxxxxxxxxxxx >> > > >> > > BUG: TASK stack guard page was hit at ffffc90003cd7fa8 (stack is >> > > ffffc90003cd8000..ffffc90003ce0000) >> > > stack guard page: 0000 [#1] PREEMPT SMP KASAN >> > > CPU: 0 PID: 3636 Comm: syz-executor238 Not tainted >> > > 6.1.0-rc7-syzkaller-00135-g6a30d3e3491d #0 >> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS >> > > Google 10/26/2022 >> > > RIP: 0010:mark_lock.part.0+0x26/0x1910 kernel/locking/lockdep.c:4593 >> > > Code: 00 00 00 00 41 57 41 56 41 55 41 89 d5 48 ba 00 00 00 00 00 fc ff df >> > > 41 54 49 89 f4 55 53 48 81 ec 38 01 00 00 48 8d 5c 24 38 <48> 89 3c 24 48 >> > > c7 44 24 38 b3 8a b5 41 48 c1 eb 03 48 c7 44 24 40 >> > > RSP: 0018:ffffc90003cd7fb8 EFLAGS: 00010096 >> > > RAX: 0000000000000004 RBX: ffffc90003cd7ff0 RCX: ffffffff8162a7bf >> > > RDX: dffffc0000000000 RSI: ffff88801f65e238 RDI: ffff88801f65d7c0 >> > > RBP: ffff88801f65e25a R08: 0000000000000000 R09: ffffffff910f4aff >> > > R10: fffffbfff221e95f R11: 0000000000000000 R12: ffff88801f65e238 >> > > R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000040000 >> > > FS: 0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000 >> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> > > CR2: ffffc90003cd7fa8 CR3: 000000000c28e000 CR4: 00000000003506f0 >> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> > > Call Trace: >> > > <TASK> >> > > mark_lock kernel/locking/lockdep.c:4598 [inline] >> > > mark_usage kernel/locking/lockdep.c:4543 [inline] >> > > __lock_acquire+0x847/0x56d0 kernel/locking/lockdep.c:5009 >> > > lock_acquire kernel/locking/lockdep.c:5668 [inline] >> > > lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633 >> > > lock_sock_nested+0x3a/0xf0 net/core/sock.c:3447 >> > > lock_sock include/net/sock.h:1721 [inline] >> > > sock_map_close+0x75/0x7b0 net/core/sock_map.c:1610 >> > >> > I'll take a look likely something recent. >> >> Fact that sock_map_close can call itself seems risky. >> We might issue a one time warning and keep the host alive. > > Agree seems better to check the condition than loop on close. > I still need to figure out the bug that got into this state > though. Thanks. I know what is happening. We're not restoring sk_prot in the child socket on clone. tcp_bpf_clone() callback currently restores sk_prot only if the listener->sk_prot is &tcp_bpf_prots[*][TCP_BASE]. It should also check for TCP_BPF_RX/TXRX. It's a regression that slipped through with c5d2177a72a1 ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self"). And we're clearly missing selftest coverage for this scenario. I can fix that.