On Sun, Jun 28, 2020 at 12:25 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > On Sat, Jun 27, 2020 at 09:02:05PM +0800, Hillf Danton wrote: > > > So, to hit this syzkaller one of these must have happened: > > > 1) rdma_addr_cancel() didn't work and the process_one_work() is still > > > runnable/running > > > > What syzbot reported indicates that the kworker did survive not only > > canceling work but the handler_mutex, despite it's a sync cancel that > > waits for the work to complete. > > The syzbot report doesn't confirm that the cancel work was actaully > called. > > The most likely situation is that it was skipped because of the state > mangling the patch fixes.. > > > > 2) The state changed away from RDMA_CM_ADDR_QUERY without doing > > > rdma_addr_cancel() > > > > The cancel does cover the query state in the reported case, and have > > difficult time working out what's in the patch below preventing the > > work from going across the line the sync cancel draws. That's the > > question we can revisit once there is a reproducer available. > > rdma-cm never seems to get reproducers from syzkaller +syzkaller mailing list Hi Jason, Wonder if there is some systematic issue. Let me double check.