On Wed, Feb 8, 2023 at 12:13 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > On Tue, Feb 7, 2023 at 6:25 PM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Fri, Feb 03, 2023 at 01:37:46PM -0800, Andrii Nakryiko wrote: > > > On Thu, Feb 2, 2023 at 3:43 AM Alexei Starovoitov > > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > > > On Wed, Feb 1, 2023 at 5:21 PM Andrii Nakryiko > > > > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > > > > > > > On Tue, Jan 31, 2023 at 4:40 PM Alexei Starovoitov > > > > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > > > > > > > On Tue, Jan 31, 2023 at 04:11:47PM -0800, Andrii Nakryiko wrote: > > > > > > > > > > > > > > > > When prog is just parsing the packet it doesn't need to finalize with bpf_dynptr_write. > > > > > > > > The prog can always write into the pointer followed by if (p == buf) bpf_dynptr_write. > > > > > > > > No need for rdonly flag, but extra copy is there in case of cloned which > > > > > > > > could have been avoided with extra rd_only flag. > > > > > > > > > > > > > > Yep, given we are designing bpf_dynptr_slice for performance, extra > > > > > > > copy on reads is unfortunate. ro/rw flag or have separate > > > > > > > bpf_dynptr_slice_rw vs bpf_dynptr_slice_ro? > > > > > > > > > > > > Either flag or two kfuncs sound good to me. > > > > > > > > > > Would it make sense to make bpf_dynptr_slice() as read-only variant, > > > > > and bpf_dynptr_slice_rw() for read/write? I think the common case is > > > > > read-only, right? And if users mistakenly use bpf_dynptr_slice() for > > > > > r/w case, they will get a verifier error when trying to write into the > > > > > returned pointer. While if we make bpf_dynptr_slice() as read-write, > > > > > users won't realize they are paying a performance penalty for > > > > > something that they don't actually need. > > > > > > > > Makes sense and it matches skb_header_pointer() usage in the kernel > > > > which is read-only. Since there is no verifier the read-only-ness > > > > is not enforced, but we can do it. > > > > > > > > Looks like we've converged on bpf_dynptr_slice() and bpf_dynptr_slice_rw(). > > > > The question remains what to do with bpf_dynptr_data() backed by skb/xdp. > > > > Should we return EINVAL to discourage its usage? > > > > Of course, we can come up with sensible behavior for bpf_dynptr_data(), > > > > but it will have quirks that will be not easy to document. > > > > Even with extensive docs the users might be surprised by the behavior. > > > > > > I feel like having bpf_dynptr_data() working in the common case for > > > skb/xdp would be nice (e.g., so basically at least work in cases when > > > we don't need to pull). > > > > > > But we've been discussing bpf_dynptr_slice() with Joanne today, and we > > > came to the conclusion that bpf_dynptr_slice()/bpf_dynptr_slice_rw() > > > should work for any kind of dynptr (LOCAL, RINGBUF, SKB, XDP). So > > > generic code that wants to work with any dynptr would be able to just > > > use bpf_dynptr_slice, even for LOCAL/RINGBUF, even though buffer won't > > > ever be filled for LOCAL/RINGBUF. > > > > great > > > > > In application, though, if I know I'm working with LOCAL or RINGBUF > > > (or MALLOC, once we have it), I'd use bpf_dynptr_data() to fill out > > > fixed parts, of course. bpf_dynptr_slice() would be cumbersome for > > > such cases (especially if I have some huge fixed part that I *know* is > > > available in RINGBUF/MALLOC case). > > > > bpf_dynptr_data() for local and ringbuf is fine, of course. > > It already exists and has to continue working. > > bpf_dynptr_data() for xdp is probably ok as well, > > but bpf_dynptr_data() for skb is problematic. > > data/data_end concept looked great back in 2016 when it was introduced > > and lots of programs were written, but we underestimated the impact > > of driver's copybreak on programs. > > Network parsing progs consume headers one by one and would typically > > be written as: > > if (header > data_end) > > return DROP; > > Some drivers copybreak fixed number of bytes. Others try to be smart > > and copy only headers into linear part of skb. > > The drivers also change. At one point we tried to upgrade the kernel > > and suddenly bpf firewall started blocking valid traffic. > > Turned out the driver copybreak heuristic was changed in that kernel. > > The bpf prog was converted to use skb_load_bytes() and the fire was extinguished. > > It was a hard lesson. > > Others learned the danger of data/data_end the hard way as well. > > Take a look at cloudflare's progs/test_cls_redirect.c. > > It's a complicated combination of data/data_end and skb_load_bytes(). > > It's essentially implementing skb_header_pointer. > > I wish we could use bpf_dynptr_slice only and remove data/data_end, > > but we cannot, since it's uapi. > > But we shouldn't repeat the same mistake. If we do bpf_dynptr_data() > > that returns linear part people will be hitting the same fragility and > > difficult to debug bugs. > > bpf_dynptr_data() for XDP is ok-ish, since most of XDP is still > > page-per-packet, but patches to split headers in HW are starting to appear. > > So even for XDP data/data_end concept may bite us. > > Hence my preference is to EINVAL in bpf_dynptr_data() at least for skb, > > since bpf_dynptr_slice() is a strictly better alternative. > > This makes sense to me, I will have the next version of this patchset > return -EINVAL if bpf_dynptr_data() is used on a skb or xdp dynptr +1, sounds reasonable to me as well