On Wed, 15 Sept 2021 at 21:36, Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > On Wed, Sep 15, 2021 at 05:41:22AM -0700, syzbot wrote: > > Hello, > > > > syzbot found the following issue on: > > > > HEAD commit: 926de8c4326c Merge tag 'acpi-5.15-rc1-3' of git://git.kern.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=11fd67ed300000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=37df9ef5660a8387 > > dashboard link: https://syzkaller.appspot.com/bug?extid=dc3dfba010d7671e05f5 > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1 > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+dc3dfba010d7671e05f5@xxxxxxxxxxxxxxxxxxxxxxxxx > > #syz dup: KASAN: use-after-free Write in addr_resolve (2) > > Frankly, I still can't figure out how this is happening > > RDMA_USER_CM_CMD_RESOLVE_IP triggers a background work and > RDMA_USER_CM_CMD_DESTROY_ID triggers destruction of the memory the > work touches. > > rdma_addr_cancel() is supposed to ensure that the work isn't and won't > run. > > So to hit this we have to either not call rdma_addr_cancel() when it > is need, or rdma_addr_cancel() has to be broken and continue to allow > the work. > > I could find nothing along either path, though rdma_addr_cancel() > relies on some complicated properties of the workqueues I'm not > entirely positive about. I stared at the code, but it's too complex to grasp it all entirely. There are definitely lots of tricky concurrent state transitions and potential for unexpected interleavings. My bet would be on some tricky hard-to-trigger thread interleaving. The only thing I can think of is adding more WARNINGs to the code to check more of these assumptions. But I don't know if there are any useful testable assumptions...