Re: [RFC PATCH 1/4] bpf: add new prog_type BPF_PROG_TYPE_IO_FILTER

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Mon, 17 Aug 2020 09:32:07 -0700

On Mon, Aug 17, 2020 at 10:18:47PM +0800, Bob Liu wrote:
> > +
> > +/* allows IO by default if no programs attached */
> > +int io_filter_bpf_run(struct bio *bio)
> > +{
> > +	struct bpf_io_request io_req = {
> > +		.sector_start = bio->bi_iter.bi_sector,
> > +		.sector_cnt = bio_sectors(bio),
> > +		.opf = bio->bi_opf,
> > +	};
> > +
> > +	return BPF_PROG_RUN_ARRAY_CHECK(bio->bi_disk->progs, &io_req, BPF_PROG_RUN);
> 
> 
> I think pass "struct bpf_io_request" is not enough, since we may want to do the filter based on
> some special patterns against the io data.
> 
> I used to pass "page_to_virt(bio->bi_io_vec->bv_page)" into ebpf program..

Bob,

Just like other bpf uapi structs the bpf_io_request is extensible and
such pointer can be added later, but I have a different question.

Leah,

Do you really need the arguments to be stable?
If so 'opf' above is not enough.
sector_start, sector_cnt are clean from uapi pov,
but 'opf' exposes kernel internals.
The patch 2 is doing:
+int protect_gpt(struct bpf_io_request *io_req)
+{
+       /* within GPT and not a read operation */
+       if (io_req->sector_start < GPT_SECTORS && (io_req->opf & REQ_OP_MASK) != REQ_OP_READ)
+               return IO_BLOCK;

The way ops are encoded changed quite a bit over the kernel releases.
First it was REQ_WRITE, then REQ_OP_SHIFT, now REQ_OP_MASK.
>From kernel pov it would be simpler if bpf side didn't impose stability
requriment on the program arguments. Then the kernel will be free to change
REG_OP_READ into something else. The progs would break, of course, and would
have to be adjusted. That's what we've been doing with tools like biosnoop.
If you're ok with unstable arguments then you wouldn't need to introduce
new prog type and this patch set.
You can do this filtering already with should_fail_bio().
bpf prog can attach to should_fail_bio() and walk all bio arguments
in unstable way.
Instead of:
+       if (io_req->sector_start < GPT_SECTORS && (io_req->opf & REQ_OP_MASK) != REQ_OP_READ)
you'll write:
  if (bio->bi_iter.bi_sector < GPT_SECTORS && (bio->bi_opf & REQ_OP_MASK) != REQ_OP_READ)
It will also work on different kernels because libbpf can adjust field offsets and
check for type matching via CO-RE facility.
Will that work for you?