On Sat, Jun 27, 2020 at 09:02:05PM +0800, Hillf Danton wrote: > > So, to hit this syzkaller one of these must have happened: > > 1) rdma_addr_cancel() didn't work and the process_one_work() is still > > runnable/running > > What syzbot reported indicates that the kworker did survive not only > canceling work but the handler_mutex, despite it's a sync cancel that > waits for the work to complete. The syzbot report doesn't confirm that the cancel work was actaully called. The most likely situation is that it was skipped because of the state mangling the patch fixes.. > > 2) The state changed away from RDMA_CM_ADDR_QUERY without doing > > rdma_addr_cancel() > > The cancel does cover the query state in the reported case, and have > difficult time working out what's in the patch below preventing the > work from going across the line the sync cancel draws. That's the > question we can revisit once there is a reproducer available. rdma-cm never seems to get reproducers from syzkaller Jason