On 3/17/22 3:20 AM, Chaitanya Kulkarni wrote: > Eric, > > On 3/16/22 7:26 PM, Eric Wheeler wrote: >> [Some people who received this message don't often get email from linux-block@xxxxxxxxxxxxxxxxxx. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.] >> >> Hi Ming, >> >> I was studying the loop.c DIO & AIO changes you made back in 2015 that >> increased loop performance and reduced the memory footprint >> (bc07c10a3603a5ab3ef01ba42b3d41f9ac63d1b6). >> >> I have a few questions if you are able to comment, here is a quick >> summary: >> >> The direct IO path starts by queuing the work: >> >> .queue_rq = loop_queue_rq: >> >> -> loop_queue_work(lo, cmd); >> -> INIT_WORK(&worker->work, loop_workfn); >> ... queue_work(lo->workqueue, work); >> >> Then from within the workqueue: >> >> -> loop_workfn() >> -> loop_process_work(worker, &worker->cmd_list, worker->lo); >> -> loop_handle_cmd(cmd); >> -> do_req_filebacked(lo, blk_mq_rq_from_pdu(cmd) ); >> -> lo_rw_aio(lo, cmd, pos, READ) // (or WRITE) >> >> From here the kiocb is setup and this is the 5.17-rc8 code at the >> bottom of lo_rw_aio() when it sets up the dispatch to the filesystem: >> >> cmd->iocb.ki_pos = pos; >> cmd->iocb.ki_filp = file; >> cmd->iocb.ki_complete = lo_rw_aio_complete; >> cmd->iocb.ki_flags = IOCB_DIRECT; >> cmd->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0); >> >> if (rw == WRITE) >> ret = call_write_iter(file, &cmd->iocb, &iter); >> else >> ret = call_read_iter(file, &cmd->iocb, &iter); >> >> lo_rw_aio_do_completion(cmd); >> >> if (ret != -EIOCBQUEUED) >> lo_rw_aio_complete(&cmd->iocb, ret); >> >> >> After having called `call_read_iter` it is in the filesystem's >> handler. >> >> Since ki_complete is defined, does that mean the filesystem will _always_ >> take these in and always queue these internally and return -EIOCBQUEUED >> from call_read_iter()? Another way to ask: if ki_complete!=NULL, can a >> filesystem ever behave synchronously? (Is there documentation about this >> somewhere? I couldn't find anything definitive.) >> > > a non-null ki_complete asks for async I/O and that is what we need to > get the higher performance. > >> >> About the cleanup after dispatch at the bottom of lo_rw_aio() from this >> code (also shown above): >> >> lo_rw_aio_do_completion(cmd); >> >> if (ret != -EIOCBQUEUED) >> lo_rw_aio_complete(&cmd->iocb, ret); >> >> * It appears that lo_rw_aio_do_completion() will `kfree(cmd->bvec)`. If >> the filesystem queued the cmd->iocb for internal use, would it have made >> a copy of cmd->bvec so that this is safe? >> >> * If ret != -EIOCBQUEUED, then lo_rw_aio_complete() is called which calls >> lo_rw_aio_do_completion() a second time. Now lo_rw_aio_do_completion >> does do this ref check, so it _is_ safe: >> >> if (!atomic_dec_and_test(&cmd->ref)) >> return; >> >> For my own understanding, is this equivalent? >> >> - lo_rw_aio_do_completion(cmd); >> >> if (ret != -EIOCBQUEUED) >> lo_rw_aio_complete(&cmd->iocb, ret); >> + else >> + lo_rw_aio_do_completion(cmd); >> >> >> >> > > I think the purpose of refcount is to make sure we free the request in > lo_rw_aio_do_completion() whoever finishes last either submission thread > or fs completion ctx calling ki_complete() -> lo_rw_aio_complete(). > what I also meant here is that if fs completion happens before we exit the lo_rw_aio() then we should not proceed with kfree(cmd->bvec) but only decrement the ref-count and wait for lo_rw_aio() to decrement the ref-count by one more time to kfree(cmd->bvec) before exit. -ck