On Wed, May 23, 2018 at 01:45:30AM +0100, Al Viro wrote: > Oh, bugger... > > wakeup > removed from queue > schedule __aio_poll_complete() > > cancel > grab ctx->lock > remove from list > work > aio_complete() > check if it's in the list > it isn't, move on to free the sucker > cancel > call ->ki_cancel() > BOOM > > Looks like we want to call ->ki_cancel() *BEFORE* removing from the list, > as well as doing fput() after aio_complete(). The same ordering, BTW, goes > for aio_read() et.al. > > Look: > CPU1: io_cancel() grabs ->ctx_lock, finds iocb and removes it from the list. > CPU2: aio_rw_complete() on that iocb. Since the sucker is not in the list > anymore, we do NOT spin on ->ctx_lock and proceed to free iocb > CPU1: pass freed iocb to ->ki_cancel(). BOOM. BTW, it seems that the mainline is vulnerable to this one. I might be missing something, but...