On Thu, Sep 16, 2021 at 09:43:19AM +0200, Dmitry Vyukov wrote: > On Wed, 15 Sept 2021 at 21:36, Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > > On Wed, Sep 15, 2021 at 05:41:22AM -0700, syzbot wrote: > > > Hello, > > > > > > syzbot found the following issue on: > > > > > > HEAD commit: 926de8c4326c Merge tag 'acpi-5.15-rc1-3' of git://git.kern.. > > > git tree: upstream > > > console output: https://syzkaller.appspot.com/x/log.txt?x=11fd67ed300000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=37df9ef5660a8387 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=dc3dfba010d7671e05f5 > > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1 > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > Reported-by: syzbot+dc3dfba010d7671e05f5@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > #syz dup: KASAN: use-after-free Write in addr_resolve (2) > > > > Frankly, I still can't figure out how this is happening > > > > RDMA_USER_CM_CMD_RESOLVE_IP triggers a background work and > > RDMA_USER_CM_CMD_DESTROY_ID triggers destruction of the memory the > > work touches. > > > > rdma_addr_cancel() is supposed to ensure that the work isn't and won't > > run. > > > > So to hit this we have to either not call rdma_addr_cancel() when it > > is need, or rdma_addr_cancel() has to be broken and continue to allow > > the work. > > > > I could find nothing along either path, though rdma_addr_cancel() > > relies on some complicated properties of the workqueues I'm not > > entirely positive about. > > I stared at the code, but it's too complex to grasp it all entirely. > There are definitely lots of tricky concurrent state transitions and > potential for unexpected interleavings. My bet would be on some tricky > hard-to-trigger thread interleaving. >From a uapi perspective the entire thing is serialized with a mutex.. > The only thing I can think of is adding more WARNINGs to the code to > check more of these assumptions. But I don't know if there are any > useful testable assumptions... Do you have any idea why we can't get a reproduction out of syzkaller here? I feel less comfortable with syzkaller's debug output, can you give some idea what it might be doing concurrently? Jason