Hi Ming, I was studying the loop.c DIO & AIO changes you made back in 2015 that increased loop performance and reduced the memory footprint (bc07c10a3603a5ab3ef01ba42b3d41f9ac63d1b6). I have a few questions if you are able to comment, here is a quick summary: The direct IO path starts by queuing the work: .queue_rq = loop_queue_rq: -> loop_queue_work(lo, cmd); -> INIT_WORK(&worker->work, loop_workfn); ... queue_work(lo->workqueue, work); Then from within the workqueue: -> loop_workfn() -> loop_process_work(worker, &worker->cmd_list, worker->lo); -> loop_handle_cmd(cmd); -> do_req_filebacked(lo, blk_mq_rq_from_pdu(cmd) ); -> lo_rw_aio(lo, cmd, pos, READ) // (or WRITE)