Re: [PATCH v9 bpf-next 3/5] bpf: Add skb dynptrs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 8, 2023 at 12:13 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
>
> On Tue, Feb 7, 2023 at 6:25 PM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Fri, Feb 03, 2023 at 01:37:46PM -0800, Andrii Nakryiko wrote:
> > > On Thu, Feb 2, 2023 at 3:43 AM Alexei Starovoitov
> > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > >
> > > > On Wed, Feb 1, 2023 at 5:21 PM Andrii Nakryiko
> > > > <andrii.nakryiko@xxxxxxxxx> wrote:
> > > > >
> > > > > On Tue, Jan 31, 2023 at 4:40 PM Alexei Starovoitov
> > > > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > > > >
> > > > > > On Tue, Jan 31, 2023 at 04:11:47PM -0800, Andrii Nakryiko wrote:
> > > > > > > >
> > > > > > > > When prog is just parsing the packet it doesn't need to finalize with bpf_dynptr_write.
> > > > > > > > The prog can always write into the pointer followed by if (p == buf) bpf_dynptr_write.
> > > > > > > > No need for rdonly flag, but extra copy is there in case of cloned which
> > > > > > > > could have been avoided with extra rd_only flag.
> > > > > > >
> > > > > > > Yep, given we are designing bpf_dynptr_slice for performance, extra
> > > > > > > copy on reads is unfortunate. ro/rw flag or have separate
> > > > > > > bpf_dynptr_slice_rw vs bpf_dynptr_slice_ro?
> > > > > >
> > > > > > Either flag or two kfuncs sound good to me.
> > > > >
> > > > > Would it make sense to make bpf_dynptr_slice() as read-only variant,
> > > > > and bpf_dynptr_slice_rw() for read/write? I think the common case is
> > > > > read-only, right? And if users mistakenly use bpf_dynptr_slice() for
> > > > > r/w case, they will get a verifier error when trying to write into the
> > > > > returned pointer. While if we make bpf_dynptr_slice() as read-write,
> > > > > users won't realize they are paying a performance penalty for
> > > > > something that they don't actually need.
> > > >
> > > > Makes sense and it matches skb_header_pointer() usage in the kernel
> > > > which is read-only. Since there is no verifier the read-only-ness
> > > > is not enforced, but we can do it.
> > > >
> > > > Looks like we've converged on bpf_dynptr_slice() and bpf_dynptr_slice_rw().
> > > > The question remains what to do with bpf_dynptr_data() backed by skb/xdp.
> > > > Should we return EINVAL to discourage its usage?
> > > > Of course, we can come up with sensible behavior for bpf_dynptr_data(),
> > > > but it will have quirks that will be not easy to document.
> > > > Even with extensive docs the users might be surprised by the behavior.
> > >
> > > I feel like having bpf_dynptr_data() working in the common case for
> > > skb/xdp would be nice (e.g., so basically at least work in cases when
> > > we don't need to pull).
> > >
> > > But we've been discussing bpf_dynptr_slice() with Joanne today, and we
> > > came to the conclusion that bpf_dynptr_slice()/bpf_dynptr_slice_rw()
> > > should work for any kind of dynptr (LOCAL, RINGBUF, SKB, XDP). So
> > > generic code that wants to work with any dynptr would be able to just
> > > use bpf_dynptr_slice, even for LOCAL/RINGBUF, even though buffer won't
> > > ever be filled for LOCAL/RINGBUF.
> >
> > great
> >
> > > In application, though, if I know I'm working with LOCAL or RINGBUF
> > > (or MALLOC, once we have it), I'd use bpf_dynptr_data() to fill out
> > > fixed parts, of course. bpf_dynptr_slice() would be cumbersome for
> > > such cases (especially if I have some huge fixed part that I *know* is
> > > available in RINGBUF/MALLOC case).
> >
> > bpf_dynptr_data() for local and ringbuf is fine, of course.
> > It already exists and has to continue working.
> > bpf_dynptr_data() for xdp is probably ok as well,
> > but bpf_dynptr_data() for skb is problematic.
> > data/data_end concept looked great back in 2016 when it was introduced
> > and lots of programs were written, but we underestimated the impact
> > of driver's copybreak on programs.
> > Network parsing progs consume headers one by one and would typically
> > be written as:
> > if (header > data_end)
> >    return DROP;
> > Some drivers copybreak fixed number of bytes. Others try to be smart
> > and copy only headers into linear part of skb.
> > The drivers also change. At one point we tried to upgrade the kernel
> > and suddenly bpf firewall started blocking valid traffic.
> > Turned out the driver copybreak heuristic was changed in that kernel.
> > The bpf prog was converted to use skb_load_bytes() and the fire was extinguished.
> > It was a hard lesson.
> > Others learned the danger of data/data_end the hard way as well.
> > Take a look at cloudflare's progs/test_cls_redirect.c.
> > It's a complicated combination of data/data_end and skb_load_bytes().
> > It's essentially implementing skb_header_pointer.
> > I wish we could use bpf_dynptr_slice only and remove data/data_end,
> > but we cannot, since it's uapi.
> > But we shouldn't repeat the same mistake. If we do bpf_dynptr_data()
> > that returns linear part people will be hitting the same fragility and
> > difficult to debug bugs.
> > bpf_dynptr_data() for XDP is ok-ish, since most of XDP is still
> > page-per-packet, but patches to split headers in HW are starting to appear.
> > So even for XDP data/data_end concept may bite us.
> > Hence my preference is to EINVAL in bpf_dynptr_data() at least for skb,
> > since bpf_dynptr_slice() is a strictly better alternative.
>
> This makes sense to me, I will have the next version of this patchset
> return -EINVAL if bpf_dynptr_data() is used on a skb or xdp dynptr

+1, sounds reasonable to me as well



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux