On Thu, 16 Sept 2021 at 16:45, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote: > > > > On Wed, Sep 15, 2021 at 05:41:22AM -0700, syzbot wrote: > > > > > Hello, > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > HEAD commit: 926de8c4326c Merge tag 'acpi-5.15-rc1-3' of git://git.kern.. > > > > > git tree: upstream > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=11fd67ed300000 > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=37df9ef5660a8387 > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=dc3dfba010d7671e05f5 > > > > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1 > > > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > Reported-by: syzbot+dc3dfba010d7671e05f5@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > > > > > #syz dup: KASAN: use-after-free Write in addr_resolve (2) > > > > > > > > Frankly, I still can't figure out how this is happening > > > > > > > > RDMA_USER_CM_CMD_RESOLVE_IP triggers a background work and > > > > RDMA_USER_CM_CMD_DESTROY_ID triggers destruction of the memory the > > > > work touches. > > > > > > > > rdma_addr_cancel() is supposed to ensure that the work isn't and won't > > > > run. > > > > > > > > So to hit this we have to either not call rdma_addr_cancel() when it > > > > is need, or rdma_addr_cancel() has to be broken and continue to allow > > > > the work. > > > > > > > > I could find nothing along either path, though rdma_addr_cancel() > > > > relies on some complicated properties of the workqueues I'm not > > > > entirely positive about. > > > > > > I stared at the code, but it's too complex to grasp it all entirely. > > > There are definitely lots of tricky concurrent state transitions and > > > potential for unexpected interleavings. My bet would be on some tricky > > > hard-to-trigger thread interleaving. > > > > From a uapi perspective the entire thing is serialized with a mutex.. > > > > > The only thing I can think of is adding more WARNINGs to the code to > > > check more of these assumptions. But I don't know if there are any > > > useful testable assumptions... > > > > Do you have any idea why we can't get a reproduction out of syzkaller > > here? > > > > I feel less comfortable with syzkaller's debug output, can you give > > some idea what it might be doing concurrently? > > It looks like a very hard to trigger race (few crashes, no reproducer, > but KASAN reports look sensible). That's probably the reason syzkaller > can't create a reproducer. > From the log it looks like it was triggered by one of these programs > below. But I tried to reproduce manually and had no success. > We are currently doing some improvements to race triggering code in > syzkaller, and may try to use this as a litmus test to see if > syzkaller will do any better: > https://github.com/google/syzkaller/issues/612#issuecomment-920961538 > > Answering your question re what was running concurrently with what. > Each of the syscalls in these programs can run up to 2 times and > ultimately any of these calls can race with any. Potentially syzkaller > can predict values kernel will return (e.g. id's) before kernel > actually returned them. I guess this does not restrict search area for > the bug a lot... > > > 11:16:53 executing program 3: > write$RDMA_USER_CM_CMD_CONNECT(0xffffffffffffffff, > &(0x7f0000000280)={0x6, 0x118, 0xfa00, {{0xfffffff7, 0x6a492eae, > "e0e55819482a40c1c535b72b0bc0bc5e4478995957e1d0fe2311a39ee3960d3488407d52fbef30809118fcbaef590c27d04918aa1348b409d45ba277d9f73bd18868a9c4fde7560288298bde7e9a96c1ef280ca62f4a6f591a2181f2e3d3cf52212fa5ae101aa1bf975763cef32e3a2c73b79d0af1d2e58b82243731e6082cab1cb1c643b7bbec2e6d45bca8a6980f148aaefb71f1933ffa50534b83267139b2324e51ffecb57959bf7e98b60516cebc8f05838a7976cef33b64410626c14dca7dcb22f0902aeb045b88656268a6dd922d6a0e7b7002e8ea90020650dced319050db3130089e5011994d90340a93088e0a8b03ea61ac3f53312342b3d6e038ae", > 0xfc, 0xe1, 0xb2, 0xd0, 0x7, 0x40, 0x0, 0x1}}}, 0x120) > r0 = openat$pfkey(0xffffffffffffff9c, &(0x7f00000001c0), 0x80800, 0x0) > r1 = syz_io_uring_setup(0x1c7, &(0x7f0000000080)={0x0, 0x0, 0x0, 0x0, > 0x7f, 0x0, r0}, &(0x7f00007b6000/0x2000)=nil, > &(0x7f0000ffd000/0x3000)=nil, &(0x7f0000000140)=<r2=>0x0, > &(0x7f0000000100)=<r3=>0x0) > socketpair$unix(0x1, 0x5, 0x0, &(0x7f0000000040)={0xffffffffffffffff, > <r4=>0xffffffffffffffff}) > r5 = openat(0xffffffffffffff9c, > &(0x7f0000000000)='/proc/self/exe\x00', 0x0, 0x0) > mmap(&(0x7f0000000000/0x800000)=nil, 0x800000, 0x0, 0x12, r5, 0x0) > r6 = openat$rdma_cm(0xffffffffffffff9c, &(0x7f0000000540), 0x2, 0x0) > write$RDMA_USER_CM_CMD_CREATE_ID(r6, &(0x7f0000000080)={0x0, 0x18, > 0xfa00, {0xffffffffffffffff, > &(0x7f0000000000)={<r7=>0xffffffffffffffff}, 0x13f}}, 0x20) > write$RDMA_USER_CM_CMD_RESOLVE_IP(r6, &(0x7f0000000100)={0x3, 0x40, > 0xfa00, {{}, {0xa, 0x0, 0x0, @mcast2}, r7}}, 0x48) > write$RDMA_USER_CM_CMD_RESOLVE_IP(r6, &(0x7f0000000180)={0x3, 0x40, > 0xfa00, {{0xa, 0x0, 0x0, @local}, {0xa, 0x0, 0x0, @mcast1}, r7}}, > 0x48) > write$RDMA_USER_CM_CMD_DESTROY_ID(r6, &(0x7f0000000280)={0x1, 0x10, > 0xfa00, {&(0x7f0000000240), r7}}, 0x18) > write$RDMA_USER_CM_CMD_CONNECT(r5, &(0x7f00000003c0)={0x6, 0x118, > 0xfa00, {{0x9, 0x9, > "f703ff619e427c1d7d50fc023c22feb64ea5083376891585a4a8b539bead7f61210a9010d88379b67ebe7a1fc77fbdd4dccaec4b498eafe4b08e7e5b28e9fe54606f87e9618b9ade4e28b66e04c73fe4660de33c075bb9b1a43c59e485dcc259fb21fed21380f9ec2c61e8d29b6069786e8bc3da0f3bded0acd13548d2d76af6e701a258307fbce30c0f452b6a25f39209c830fe557de6f1fb3fdfe4347be3a9fdfeaca47b97e333a266013beef7cb7d7ea746bca1d3a929747a269df24d019e3e413309e58095182dd5dc3c8a088e94abf8d5cd389749cc80e4e452c8dabe7eaadd8144e2c4392e35c1b5ad3369ee7b2f855e5ebe9bdc0e8a464e8a9e4f54c0", > 0x2, 0xff, 0x1, 0x8f, 0x6, 0x3, 0x6}, r7}}, 0x120) > syz_io_uring_submit(r2, r3, > &(0x7f0000000180)=@IORING_OP_READ=@pass_buffer={0x16, 0x4, 0x0, > @fd=r4, 0x0, &(0x7f0000000000)=""/7, 0x7}, 0x0) > syz_io_uring_submit(r2, r3, > &(0x7f0000002f80)=@IORING_OP_LINK_TIMEOUT={0xf, 0x0, 0x0, 0x0, 0x0, > &(0x7f0000000240)={0x0, 0x3938700}}, 0x10000007) > io_uring_enter(r1, 0x45f5, 0x0, 0x0, 0x0, 0xf5ff) > > > > 11:16:55 executing program 4: > r0 = openat$rdma_cm(0xffffffffffffff9c, &(0x7f0000000540), 0x2, 0x0) > write$RDMA_USER_CM_CMD_CREATE_ID(r0, &(0x7f0000000080)={0x0, 0x18, > 0xfa00, {0xffffffffffffffff, > &(0x7f0000000000)={<r1=>0xffffffffffffffff}, 0x13f}}, 0x20) > write$RDMA_USER_CM_CMD_RESOLVE_IP(r0, &(0x7f0000000100)={0x3, 0x40, > 0xfa00, {{0xa, 0xfffd}, {0xa, 0x0, 0x10000000, @ipv4={'\x00', > '\xff\xff', @broadcast}, 0x3}, r1, 0xfffffffe}}, 0x48) > write$RDMA_USER_CM_CMD_RESOLVE_IP(r0, &(0x7f0000000180)={0x3, 0x40, > 0xfa00, {{0xa, 0x2, 0x0, @ipv4={'\x00', '\xff\xff', @multicast2}}, > {0xa, 0x0, 0x0, @initdev={0xfe, 0x88, '\x00', 0x1, 0x0}}, r1}}, 0xd5) > write$RDMA_USER_CM_CMD_DESTROY_ID(r0, &(0x7f0000000280)={0x1, 0x10, > 0xfa00, {&(0x7f0000000240), r1}}, 0x18) > write$RDMA_USER_CM_CMD_BIND_IP(0xffffffffffffffff, > &(0x7f0000000000)={0x2, 0x28, 0xfa00, {0x0, {0xa, 0x4e23, 0x9, > @ipv4={'\x00', '\xff\xff', @rand_addr=0x64010102}, 0x100}, r1}}, 0x30) > openat$qrtrtun(0xffffffffffffff9c, &(0x7f0000002740), 0x101002) > io_setup(0x8, &(0x7f0000000600)=<r2=>0x0) > clock_getres(0xfffffffffffffffd, 0x0) > r3 = openat$hwrng(0xffffffffffffff9c, &(0x7f0000000040), 0x400, 0x0) > r4 = openat$vcsa(0xffffffffffffff9c, &(0x7f00000002c0), 0x8000, 0x0) > r5 = openat$rdma_cm(0xffffffffffffff9c, &(0x7f0000000540), 0x2, 0x0) > write$RDMA_USER_CM_CMD_CREATE_ID(r5, &(0x7f0000000080)={0x0, 0x18, > 0xfa00, {0xffffffffffffffff, > &(0x7f0000000000)={<r6=>0xffffffffffffffff}, 0x13f}}, 0x20) > write$RDMA_USER_CM_CMD_RESOLVE_IP(r5, &(0x7f0000000100)={0x3, 0x40, > 0xfa00, {{}, {0xa, 0x0, 0x0, @mcast2}, r6}}, 0x48) > write$RDMA_USER_CM_CMD_RESOLVE_IP(r5, &(0x7f0000000180)={0x3, 0x40, > 0xfa00, {{0xa, 0x0, 0x0, @local}, {0xa, 0x0, 0x0, @mcast1}, r6}}, > 0x48) > write$RDMA_USER_CM_CMD_DESTROY_ID(r5, &(0x7f0000000280)={0x1, 0x10, > 0xfa00, {&(0x7f0000000240), r6}}, 0x18) > pipe2(&(0x7f0000000340)={<r7=>0xffffffffffffffff}, 0x800) > write$RDMA_USER_CM_CMD_RESOLVE_IP(r7, &(0x7f0000000380)={0x3, 0x40, > 0xfa00, {{0xa, 0x4e23, 0x8, @remote, 0x3ff}, {0xa, 0x4e22, 0x8000, > @initdev={0xfe, 0x88, '\x00', 0x1, 0x0}, 0x8}, r1, 0x1f}}, 0x48) > write$RDMA_USER_CM_CMD_LISTEN(r4, &(0x7f0000000300)={0x7, 0x8, 0xfa00, > {r6, 0x8}}, 0x10) > io_submit(r2, 0x1, &(0x7f0000000200)=[&(0x7f00000000c0)={0x0, 0x0, > 0x0, 0x1, 0x0, 0xffffffffffffffff, > &(0x7f0000000400)="03a0a445bc5d7a9d6c", 0x9, 0x7fffffff, 0x0, 0x0, > r3}]) I noticed we also had 2 KCSAN reports that mention rdma_resolve_addr. On commit 1df0d896: ================================================================== BUG: KCSAN: data-race in addr_handler / cma_check_port write to 0xffff88809fa40a1c of 4 bytes by task 21 on cpu 1: cma_comp_exch drivers/infiniband/core/cma.c:426 [inline] addr_handler+0x9f/0x2b0 drivers/infiniband/core/cma.c:3141 process_one_req+0x22f/0x300 drivers/infiniband/core/addr.c:645 process_one_work+0x3e1/0x9a0 kernel/workqueue.c:2269 worker_thread+0x665/0xbe0 kernel/workqueue.c:2415 kthread+0x20d/0x230 kernel/kthread.c:291 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293 read to 0xffff88809fa40a1c of 4 bytes by task 11997 on cpu 0: cma_check_port+0xbd/0x700 drivers/infiniband/core/cma.c:3506 cma_use_port drivers/infiniband/core/cma.c:3541 [inline] cma_get_port drivers/infiniband/core/cma.c:3623 [inline] rdma_bind_addr+0x1639/0x1910 drivers/infiniband/core/cma.c:3741 cma_bind_addr drivers/infiniband/core/cma.c:3252 [inline] rdma_resolve_addr+0x486/0x1240 drivers/infiniband/core/cma.c:3264 ucma_resolve_ip+0x121/0x1b0 drivers/infiniband/core/ucma.c:722 ucma_write+0x229/0x250 drivers/infiniband/core/ucma.c:1764 vfs_write+0x1d6/0x690 fs/read_write.c:576 ksys_write+0xce/0x180 fs/read_write.c:631 __do_sys_write fs/read_write.c:643 [inline] __se_sys_write fs/read_write.c:640 [inline] __x64_sys_write+0x3e/0x50 fs/read_write.c:640 do_syscall_64+0x51/0xb0 arch/x86/entry/common.c:384 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 11997 Comm: syz-executor.4 Not tainted 5.8.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ================================================================== and on commit 5863cc79: ================================================================== BUG: KCSAN: data-race in cma_comp_exch / rdma_resolve_addr write to 0xffff8880a73bda1c of 4 bytes by task 7740 on cpu 0: cma_comp_exch+0x84/0xc0 drivers/infiniband/core/cma.c:441 addr_handler+0x80/0x2f0 drivers/infiniband/core/cma.c:3033 process_one_req+0xc2/0x3a0 drivers/infiniband/core/addr.c:644 process_one_work+0x3d4/0x890 kernel/workqueue.c:2269 worker_thread+0xa0/0x800 kernel/workqueue.c:2415 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 read to 0xffff8880a73bda1c of 4 bytes by task 30237 on cpu 1: rdma_resolve_addr+0x56/0x10e0 drivers/infiniband/core/cma.c:3156 ucma_resolve_ip+0x105/0x180 drivers/infiniband/core/ucma.c:708 ucma_write+0x1fe/0x2a0 drivers/infiniband/core/ucma.c:1684 __vfs_write+0x67/0xc0 fs/read_write.c:494 vfs_write fs/read_write.c:558 [inline] vfs_write+0x18a/0x390 fs/read_write.c:542 ksys_write+0x17b/0x1b0 fs/read_write.c:611 __do_sys_write fs/read_write.c:623 [inline] __se_sys_write fs/read_write.c:620 [inline] __x64_sys_write+0x4c/0x60 fs/read_write.c:620 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 30237 Comm: syz-executor.0 Not tainted 5.4.0-rc7+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ==================================================================