Hi Jens, On Thu, Jan 19, 2023 at 11:49:04AM -0700, Jens Axboe wrote: > On 1/19/23 7:23 AM, Ming Lei wrote: > > Hi, > > > > ublk-nbd[1] is available now. > > > > Basically it is one nbd client, but totally implemented in userspace, > > and wrt. current nbd-client in [2], the transmission phase is done > > by linux block nbd driver. > > > > The handshake implementation is borrowed from nbd project[2], so > > basically ublk-nbd just adds new code for implementing transmission > > phase, and it can be thought as moving linux block nbd driver into > > userspace. > > > > The added new code is basically in nbd/tgt_nbd.cpp, and io handling > > is based on liburing[3], and implemented by c++20 coroutine, so > > everything is done in single pthread totally lockless, meantime turns > > out it is pretty easy to design & implement, attributed to ublk framework, > > c++20 coroutine and liburing. > > > > ublk-nbd supports both tcp and unix socket, and allows to enable io_uring > > send zero copy via command line '--send_zc', see details in README[4]. > > > > No regression is found in xfstests by using ublk-nbd as both test device > > and scratch device, and builtin test(make test T=nbd) runs well. > > > > Fio test("make test T=nbd") shows that ublk-nbd performance is > > basically same with nbd-client/nbd driver when running fio on real > > ethernet link(1g, 10+g), but ublk-nbd IOPS is higher by ~40% than > > nbd-client(nbd driver) with 512K BS, which is because linux nbd > > driver sets max_sectors_kb as 64KB at default. > > > > But when running fio over local tcp socket, it is observed in my test > > machine that ublk-nbd performs better than nbd-client/nbd driver, > > especially with 2 queue/2 jobs, and the gap could be 10% ~ 30% > > according to different block size. > > This is pretty nice! Just curious, have you tried setting up your > ring with > > p.flags |= IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN; > > and see if that yields any extra performance improvements for you? > Depending on how you do processing, you should not need to do any > further changes there. > > A "lighter" version is just setting IORING_SETUP_COOP_TASKRUN. IORING_SETUP_COOP_TASKRUN is enabled in current ublksrv. After disabling COOP_TASKRUN and enabling SINGLE_ISSUER & DEFER_TASKRUN, not see obvious improvement, meantime regression is observed on 64k rw. Thanks, Ming