Re: [RFC PATCH 08/22] ublk: bpf: add bpf struct_ops

Ming Lei <tom.leiming@xxxxxxxxx> · Mon, 13 Jan 2025 12:08:14 +0800

Hello Alexei,

Thanks for your comments!

On Thu, Jan 09, 2025 at 05:43:12PM -0800, Alexei Starovoitov wrote:
> On Tue, Jan 7, 2025 at 4:08 AM Ming Lei <tom.leiming@xxxxxxxxx> wrote:
> > +
> > +/* Return true if io cmd is queued, otherwise forward it to userspace */
> > +bool ublk_run_bpf_handler(struct ublk_queue *ubq, struct request *req,
> > +                         queue_io_cmd_t cb)
> > +{
> > +       ublk_bpf_return_t ret;
> > +       struct ublk_rq_data *data = blk_mq_rq_to_pdu(req);
> > +       struct ublksrv_io_desc *iod = ublk_get_iod(ubq, req->tag);
> > +       struct ublk_bpf_io *bpf_io = &data->bpf_data;
> > +       const unsigned long total = iod->nr_sectors << 9;
> > +       unsigned int done = 0;
> > +       bool res = true;
> > +       int err;
> > +
> > +       if (!test_bit(UBLK_BPF_IO_PREP, &bpf_io->flags))
> > +               ublk_bpf_prep_io(bpf_io, iod);
> > +
> > +       do {
> > +               enum ublk_bpf_disposition rc;
> > +               unsigned int bytes;
> > +
> > +               ret = cb(bpf_io, done);
> 
> High level observation...
> I suspect forcing all sturct_ops callbacks to have only these
> two arguments and packing args into ublk_bpf_io
> will be limiting in the long term.

There are three callbacks defined, and only the two with same type for
queuing io commands are covered in this function.

But yes, callback type belongs to API, which should be designed
carefully, and I will think about further.

> 
> And this part of api would need to be redesigned,
> but since it's not an uapi... not a big deal.
> 
> > +               rc = ublk_bpf_get_disposition(ret);
> > +
> > +               if (rc == UBLK_BPF_IO_QUEUED)
> > +                       goto exit;
> > +
> > +               if (rc == UBLK_BPF_IO_REDIRECT)
> > +                       break;
> 
> Same point about return value processing...
> Each struct_ops callback could have had its own meaning
> of retvals.
> I suspect it would have been more flexible and more powerful
> this way.

Yeah, I agree, just the 3rd callback of release_io_cmd_t isn't covered
in this function.

> 
> Other than that bpf plumbing looks good.
> 
> There is an issue with leaking allocated memory in bpf_aio_alloc kfunc
> (it probably should be KF_ACQUIRE)

It is one problem which troubles me too:

- another callback of struct_ops/bpf_aio_complete_cb is guaranteed to be
called after the 'struct bpf_aio' instance is submitted via kfunc
bpf_aio_submit(), and it is supposed to be freed from
struct_ops/bpf_aio_complete_cb

- but the following verifier failure is triggered if bpf_aio_alloc and
bpf_aio_release are marked as KF_ACQUIRE & KF_RELEASE.

```
libbpf: prog 'ublk_loop_comp_cb': -- BEGIN PROG LOAD LOG --
Global function ublk_loop_comp_cb() doesn't return scalar. Only those are supported.
```

Here 'struct bpf_aio' instance isn't stored in map, and it is provided
from struct_ops callback(bpf_aio_complete_cb), I appreciate you may share
any idea about how to let KF_ACQUIRE/KF_RELEASE cover the usage here.

> and a few other things, but before doing any in depth review
> from bpf pov I'd like to hear what block folks think.

Me too, look forward to comments from our block guys.

> 
> Motivation looks useful,
> but the claim of performance gains without performance numbers
> is a leap of faith.

Follows some data:

1) ublk-null bpf vs. ublk-null with bpf

- 1.97M IOPS vs. 3.7M IOPS  

- setup ublk-null

	cd tools/testing/selftests/ublk
	./ublk_bpf add -t null -q 2

- setup ublk-null wit bpf

	cd tools/testing/selftests/ublk
	./ublk_bpf reg -t null ./ublk_null.bpf.o
	./ublk_bpf add -t null -q 2 --bpf_prog 0

- run  `fio/t/io_uring -p 0 /dev/ublkb0`

2) ublk-loop

The built-in utility of `ublk_bpf` only supports bpf io handling, but compared
with ublksrv, the improvement isn't so big, still with ~10%. One reason
is that bpf aio is just started, not optimized, in theory:

- it saves one kernel-user context switch
- save one time of user-kernel IO buffer copy
- much less io handling code footprint compared with userspace io handling

The improvement is supposed to be big especially in big chunk size
IO workload.

Thanks,
Ming