On 2023/4/19 10:31, Ming Lei wrote: > On Wed, Apr 19, 2023 at 12:03:40AM +0000, Harris, James R wrote: >> Hi, >> >> I’m working on adding NEED_GET_DATA support to the SPDK ublk server, to avoid allocating I/O buffers until they are actually needed. >> >> It is very clear how this works for write commands with NEED_GET_DATA. We wait to allocate the buffer until we get UBLK_IO_RES_NEED_GET_DATA completion and submit again using UBLK_IO_NEED_GET_DATA. After we get the UBLK_IO_RES_OK completion from ublk, we submit the block request to the SPDK bdev layer. After it completes, we submit using UBLK_IO_COMMIT_AND_FETCH_REQ and can free the I/O buffer because the data has been committed. >> >> But how does this work for the read path? On a read, I can wait to allocate the buffer until I get the UBLK_IO_RES_OK completion. But after the read operation is completed and SPDK submits the UBLK_IO_COMMIT_AND_FETCH_REQ, how do I know when ublk has finished copying data out of the buffer so that I can reuse that buffer? >> >> I’m sure I’m missing something obvious, if anyone can provide a pointer I would appreciate it. > > Yeah, it is one known issue. > > But the buffer is guaranteed to be ready for reuse after io_uring_enter() > returns. > > Also the patch[1] exposes read/write interface to /dev/ublkcN, > then NEED_GET_DATA becomes not necessary any more. The initial > motivation is for zero-copy, but it can be used for non-zc too, such as > don't do any copy in driver side, and simply let userspace cover > everything: > > 1) ublk server gets one io command, allocate buffer and handle this > command > > 2) if it is WRITE command, read buffer data via read() from > /dev/ublkcN, then handle the write by this buffer > > 3) if it is READ command, handle the read using the allocated buffer, > and after the data is ready in this buffer, write data in buffer to > /dev/ublkcN. > > This approach could be cleaner for both sides, but add one extra cost > of read/write /dev/ublkcN in userspace, and it can be done via io_uring > too. > > What do you think of this approach? > > [1] https://lore.kernel.org/linux-block/20230330113630.1388860-16-ming.lei@xxxxxxxxxx/ This idea is good, IMO. Regards, Zhang > > > Thanks, > Ming