Re: [PATCH] Docs: ublk: add ublk document

Ziyang Zhang <ZiyangZhang@xxxxxxxxxxxxxxxxx> · Thu, 1 Sep 2022 10:47:36 +0800

On 2022/9/1 09:34, Ming Lei wrote:
> 
>>      This makes existing backend very hard to adapt to ublk because they may
>>      want to know the data length or other attributes of the new request.
> 
> It is just for existing project.

Existing project is very important. I believe that embedding libublksrv/UAPI
into existing projects(products) makes ublk more popular and useful.

> 
> Any new project can read the data from the pre-allocated buffer
> directly. That is exactly the handling flow: ublksrv gets one request from
> ublk driver, then let backend handle the request.

Your are correct, Ming. ublksrv tgts does not need UBLK_IO_NEED_GET_DATA.

> 
>>
>> (2) If the backend does not provide the data buffer IN ADVANCE, ublksrv must
>>     pre-allocates data buffer. So a additional data copy from ublksrv to
>>     the backend(such as a RPC mempool) is unavoidable.
> 
> Can you explain why backend can't use the pre-allocated buffer directly? Before
> backend completes the io request, the io request and buffer won't be reused, that
> is owned by this tag/slot.

For existing projects using ublksrv, why it must use ublksrv's pre-allocated buffer?
The backend has its own buffer management.

Besides, existing projects may directly embed libublksrv/UAPI into it.
UBLK_IO_NEED_GET_DATA is just an option for them.

Ming, UBLK_IO_NEED_GET_DATA usecases has been proved useful and we have discussed
it when I introduced it into kernel driver. Really (1)users use ublksrv directly
or (2)developers implement new ublksrv targets do not have to care about it.

> 
>>
>> With UBLK_IO_NEED_GET_DATA, the WRITE request will be firstly issued to ublksrv
>> without data copy. Then, backend gets the request and it can allocate data
>> buffer and embed its addr inside a new ioucmd. After the kernel driver gets the
>> ioucmd, the data copy happens(from biovecs to backend's buffer). Finally,
>> the backend gets the request again with data to be written and it can truly
>> handle the request.
> 
> That is definitely inefficient, and I won't encourage any new project to
> use this command.

UBLK_IO_NEED_GET_DATA is an option. Any user thinks that it may lower performance
should not use it.

BTW, our tests shows that UBLK_IO_NEED_GET_DATA add one additional
round-trip in ublk_drv and one io_uring_enter() syscall.

UBLK_IO_NEED_GET_DATA does not lower the IOPS too much if:
(1) iodepth is bigger. This is because io_uring batches sqes(ioucmds) so the
    syscall overhead is not significant.
(2) the backend is slow. For example, with a network(RPC) backend, we really
    do not care this round-trip since the backend IO handling
    is far slower than ublk_drv's data path.

In conclusion, UBLK_IO_NEED_GET_DATA is designed for existing projects, not for
ublksrv(though it supports this feature) targets. UBLK_IO_NEED_GET_DATA is COMPLETELY
motivated by our real practice in developing userspace storage products.

Regards,
Zhang