On Thu, Sep 16, 2021 at 04:45:27PM +0200, Dmitry Vyukov wrote: > Answering your question re what was running concurrently with what. > Each of the syscalls in these programs can run up to 2 times and > ultimately any of these calls can race with any. Potentially syzkaller > can predict values kernel will return (e.g. id's) before kernel > actually returned them. I guess this does not restrict search area for > the bug a lot... I have a reasonable theory now.. Based on the ops you provided this FSM sequence is possible RDMA_USER_CM_CMD_RESOLVE_IP RDMA_CM_IDLE -> RDMA_CM_ADDR_QUERY does rdma_resolve_ip(addr_handler) addr_handler RDMA_CM_ADDR_QUERY -> RDMA_CM_ADDR_BOUND [.. handler still running ..] RDMA_USER_CM_CMD_RESOLVE_IP RDMA_CM_ADDR_BOUND -> RDMA_CM_ADDR_QUERY does rdma_resolve_ip(addr_handler) RDMA_DESTROY_ID rdma_addr_cancel() Which, if it happens fast enough, could trigger a situation where the '&id_priv->id.route.addr.dev_addr' "handle" is in the req_list twice beacause the addr_handler work queue hasn't yet got to the point of deleting it from the req_list before the the 2nd one is added. The issue is rdma_addr_cancel() has to be called rdma_resolve_ip() can be called again. Skipping it will cause 'req_list' to have two items in the internal linked list with the same key and it will not cancel the newest one with the active timer. This would cause the use after free syndrome like this trace is showing. I can make a patch, but have no way to know if it is any good :\ Jason