Re: [RFC PATCH 08/22] ublk: bpf: add bpf struct_ops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jan 12, 2025 at 8:08 PM Ming Lei <tom.leiming@xxxxxxxxx> wrote:
>
> Hello Alexei,
>
> Thanks for your comments!
>
> On Thu, Jan 09, 2025 at 05:43:12PM -0800, Alexei Starovoitov wrote:
> > On Tue, Jan 7, 2025 at 4:08 AM Ming Lei <tom.leiming@xxxxxxxxx> wrote:
> > > +
> > > +/* Return true if io cmd is queued, otherwise forward it to userspace */
> > > +bool ublk_run_bpf_handler(struct ublk_queue *ubq, struct request *req,
> > > +                         queue_io_cmd_t cb)
> > > +{
> > > +       ublk_bpf_return_t ret;
> > > +       struct ublk_rq_data *data = blk_mq_rq_to_pdu(req);
> > > +       struct ublksrv_io_desc *iod = ublk_get_iod(ubq, req->tag);
> > > +       struct ublk_bpf_io *bpf_io = &data->bpf_data;
> > > +       const unsigned long total = iod->nr_sectors << 9;
> > > +       unsigned int done = 0;
> > > +       bool res = true;
> > > +       int err;
> > > +
> > > +       if (!test_bit(UBLK_BPF_IO_PREP, &bpf_io->flags))
> > > +               ublk_bpf_prep_io(bpf_io, iod);
> > > +
> > > +       do {
> > > +               enum ublk_bpf_disposition rc;
> > > +               unsigned int bytes;
> > > +
> > > +               ret = cb(bpf_io, done);
> >
> > High level observation...
> > I suspect forcing all sturct_ops callbacks to have only these
> > two arguments and packing args into ublk_bpf_io
> > will be limiting in the long term.
>
> There are three callbacks defined, and only the two with same type for
> queuing io commands are covered in this function.
>
> But yes, callback type belongs to API, which should be designed
> carefully, and I will think about further.
>
> >
> > And this part of api would need to be redesigned,
> > but since it's not an uapi... not a big deal.
> >
> > > +               rc = ublk_bpf_get_disposition(ret);
> > > +
> > > +               if (rc == UBLK_BPF_IO_QUEUED)
> > > +                       goto exit;
> > > +
> > > +               if (rc == UBLK_BPF_IO_REDIRECT)
> > > +                       break;
> >
> > Same point about return value processing...
> > Each struct_ops callback could have had its own meaning
> > of retvals.
> > I suspect it would have been more flexible and more powerful
> > this way.
>
> Yeah, I agree, just the 3rd callback of release_io_cmd_t isn't covered
> in this function.
>
> >
> > Other than that bpf plumbing looks good.
> >
> > There is an issue with leaking allocated memory in bpf_aio_alloc kfunc
> > (it probably should be KF_ACQUIRE)
>
> It is one problem which troubles me too:
>
> - another callback of struct_ops/bpf_aio_complete_cb is guaranteed to be
> called after the 'struct bpf_aio' instance is submitted via kfunc
> bpf_aio_submit(), and it is supposed to be freed from
> struct_ops/bpf_aio_complete_cb
>
> - but the following verifier failure is triggered if bpf_aio_alloc and
> bpf_aio_release are marked as KF_ACQUIRE & KF_RELEASE.
>
> ```
> libbpf: prog 'ublk_loop_comp_cb': -- BEGIN PROG LOAD LOG --
> Global function ublk_loop_comp_cb() doesn't return scalar. Only those are supported.
> ```

That's odd.
Adding KF_ACQ/REL to bpf_aio_alloc/release kfuncs shouldn't affect
verification of ublk_loop_comp_cb() prog. It's fine for it to stay 'void'
return.
You probably made it global function and that's was the reason for this
verifier error. Global funcs have to return scalar for now.
We can relax this restriction if necessary.

>
> Here 'struct bpf_aio' instance isn't stored in map, and it is provided
> from struct_ops callback(bpf_aio_complete_cb), I appreciate you may share
> any idea about how to let KF_ACQUIRE/KF_RELEASE cover the usage here.

This is so that:

ublk_loop_comp_cb ->
  ublk_loop_comp_and_release_aio ->
    bpf_aio_release

would properly recognize that ref to aio is dropped?

Currently the verifier doesn't support that,
but there is work in progress to add this feature:

https://lore.kernel.org/bpf/20241220195619.2022866-2-amery.hung@xxxxxxxxx/

then in cfi_stabs annotated bio argument in bpf_aio_complete_cb()
as "struct bpf_aio *aio__ref"

Then the verifier will recognize that callback argument
comes refcounted and the prog has to call KF_RELEASE kfunc on it.


>
> > and a few other things, but before doing any in depth review
> > from bpf pov I'd like to hear what block folks think.
>
> Me too, look forward to comments from our block guys.
>
> >
> > Motivation looks useful,
> > but the claim of performance gains without performance numbers
> > is a leap of faith.
>
> Follows some data:
>
> 1) ublk-null bpf vs. ublk-null with bpf
>
> - 1.97M IOPS vs. 3.7M IOPS
>
> - setup ublk-null
>
>         cd tools/testing/selftests/ublk
>         ./ublk_bpf add -t null -q 2
>
> - setup ublk-null wit bpf
>
>         cd tools/testing/selftests/ublk
>         ./ublk_bpf reg -t null ./ublk_null.bpf.o
>         ./ublk_bpf add -t null -q 2 --bpf_prog 0
>
> - run  `fio/t/io_uring -p 0 /dev/ublkb0`
>
> 2) ublk-loop
>
> The built-in utility of `ublk_bpf` only supports bpf io handling, but compared
> with ublksrv, the improvement isn't so big, still with ~10%. One reason
> is that bpf aio is just started, not optimized, in theory:
>
> - it saves one kernel-user context switch
> - save one time of user-kernel IO buffer copy
> - much less io handling code footprint compared with userspace io handling
>
> The improvement is supposed to be big especially in big chunk size
> IO workload.
>
>
> Thanks,
> Ming





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux