Re: [LSF/MM/BPF TOPIC] block drivers in user space

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 28, 2022 at 04:20:03PM -0400, Gabriel Krisman Bertazi wrote:
> Ming Lei <ming.lei@xxxxxxxxxx> writes:
> 
> > IMO it needn't 'inverse io_uring', the normal io_uring SQE/CQE model
> > does cover this case, the userspace part can submit SQEs beforehand
> > for getting notification of each incoming io request from kernel driver,
> > then after one io request is queued to the driver, the driver can
> > queue a CQE for the previous submitted SQE. Recent posted patch of
> > IORING_OP_URING_CMD[1] is perfect for such purpose.
> >
> > I have written one such userspace block driver recently, and [2] is the
> > kernel part blk-mq driver(ubd driver), the userspace part is ubdsrv[3].
> > Both the two parts look quite simple, but still in very early stage, so
> > far only ubd-loop and ubd-null targets are implemented in [3]. Not only
> > the io command communication channel is done via IORING_OP_URING_CMD, but
> > also IO handling for ubd-loop is implemented via plain io_uring too.
> >
> > It is basically working, for ubd-loop, not see regression in 'xfstests -g auto'
> > on the ubd block device compared with same xfstests on underlying disk, and
> > my simple performance test on VM shows the result isn't worse than kernel loop
> > driver with dio, or even much better on some test situations.
> 
> Thanks for sharing.  This is a very interesting implementation that
> seems to cover quite well the original use case.  I'm giving it a try and
> will report back.
> 
> > Wrt. this userspace block driver things, I am more interested in the following
> > sub-topics:
> >
> > 1) zero copy
> > - the ubd driver[2] needs one data copy: for WRITE request, copy pages
> >   in io request to userspace buffer before handling the WRITE IO by ubdsrv;
> >   for READ request, the reverse copy is done after READ request is
> >   handled by ubdsrv
> >
> > - I tried to apply zero copy via remap_pfn_range() for avoiding this
> >   data copy, but looks it can't work for ubd driver, since pages in the
> >   remapped vm area can't be retrieved by get_user_pages_*() which is called in
> >   direct io code path
> >
> > - recently Xiaoguang Wang posted one RFC patch[4] for support zero copy on
> >   tcmu, and vm_insert_page(s)_mkspecial() is added for such purpose, but
> >   it has same limit of remap_pfn_range; Also Xiaoguang mentioned that
> >   vm_insert_pages may work, but anonymous pages can not be remapped by
> >   vm_insert_pages.
> >
> > - here the requirement is to remap either anonymous pages or page cache
> >   pages into userspace vm, and the mapping/unmapping can be done for
> >   each IO runtime. Is this requirement reasonable? If yes, is there any
> >   easy way to implement it in kernel?
> 
> I've run into the same issue with my fd implementation and haven't been
> able to workaround it.
> 
> > 4) apply eBPF in userspace block driver
> > - it is one open topic, still not have specific or exact idea yet,
> >
> > - is there chance to apply ebpf for mapping ubd io into its target handling
> > for avoiding data copy and remapping cost for zero copy?
> 
> I was thinking of something like this, or having a way for the server to
> only operate on the fds and do splice/sendfile.  But, I don't know if it
> would be useful for many use cases.  We also want to be able to send the
> data to userspace, for instance, for userspace networking.

I understand the big point is that how to pass the io data to ubd driver's
request/bio pages. But splice/sendfile just transfers data between two FDs,
then how can the block request/bio's pages get filled with expected data?
Can you explain a bit in detail?

If block layer is bypassed, it won't be exposed as block disk to userspace.


thanks,
Ming




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux