On Tue, May 17, 2022 at 01:53:57PM +0800, Ming Lei wrote: > Hello Guys, > > ubd driver is one kernel driver for implementing generic userspace block > device/driver, which delivers io request from ubd block device(/dev/ubdbN) into > ubd server[1] which is the userspace part of ubd for communicating > with ubd driver and handling specific io logic by its target module. > > Another thing ubd driver handles is to copy data between user space buffer > and request/bio's pages, or take zero copy if mm is ready for support it in > future. ubd driver doesn't handle any IO logic of the specific driver, so > it is small/simple, and all io logics are done by the target code in ubdserver. > > The above two are main jobs done by ubd driver. > > ubd driver can help to move IO logic into userspace, in which the > development work is easier/more effective than doing in kernel, such as, > ubd-loop takes < 200 lines of loop specific code to get basically same > function with kernel loop block driver, meantime the performance is > still good. ubdsrv[1] provide built-in test for comparing both by running > "make test T=loop". > > Another example is high performance qcow2 support[2], which could be built with > ubd framework more easily than doing it inside kernel. > > Also there are more people who express interests on userspace block driver[3], > Gabriel Krisman Bertazi proposes this topic in lsf/mm/ebpf 2022 and mentioned > requirement from Google. Ziyang Zhang from Alibaba said they "plan to > replace TCMU by UBD as a new choice" because UBD can get better throughput than > TCMU even with single queue[4], meantime UBD is simple. Also there is userspace > storage service for providing storage to containers. > > It is io_uring based: io request is delivered to userspace via new added > io_uring command which has been proved as very efficient for making nvme > passthrough IO to get better IOPS than io_uring(READ/WRITE). Meantime one > shared/mmap buffer is used for sharing io descriptor to userspace, the > buffer is readonly for userspace, each IO just takes 24bytes so far. > It is suggested to use io_uring in userspace(target part of ubd server) > to handle IO request too. And it is still easy for ubdserver to support > io handling by non-io_uring, and this work isn't done yet, but can be > supported easily with help o eventfd. > > This way is efficient since no extra io command copy is required, no sleep > is needed in transferring io command to userspace. Meantime the communication > protocol is simple and efficient, one single command of > UBD_IO_COMMIT_AND_FETCH_REQ can handle both fetching io request desc and commit > command result in one trip. IO handling is often batched after single > io_uring_enter() returns, both IO requests from ubd server target and > IO commands could be handled as a whole batch. > > Remove RFC now because ubd driver codes gets lots of cleanup, enhancement and > bug fixes since V1: > > - cleanup uapi: remove ubd specific error code, switch to linux error code, > remove one command op, remove one field from cmd_desc > > - add monitor mechanism to handle ubq_daemon being killed, ubdsrv[1] > includes builtin tests for covering heavy IO with deleting ubd / killing > ubq_daemon at the same time, and V2 pass all the two tests(make test T=generic), > and the abort/stop mechanism is simple > > - fix MQ command buffer mmap bug, and now 'xfstetests -g auto' works well on > MQ ubd-loop devices(test/scratch) > > - improve batching submission as suggested by Jens > > - improve handling for starting device, replace random wait/poll with > completion > > - all kinds of cleanup, bug fix,.. > > And the patch by patch change since V1 can be found in the following > tree: > > https://github.com/ming1/linux/commits/my_for-5.18-ubd-devel_v2 BTW, a one-line fix[1] is added to above branch, which fixes performance obviously on small BS(< 128k) test. If anyone run performance test, please include this fix. [1] https://github.com/ming1/linux/commit/fa91354b418e83953304a3efad4ee6ac40ea6110 Thanks, Ming