Re: [PATCH bpf-next v1 1/3] bpf: Add skb dynptrs

Joanne Koong <joannelkoong@xxxxxxxxx> · Wed, 3 Aug 2022 20:44:39 -0700

On Wed, Aug 3, 2022 at 6:34 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
>
> On Wed, 3 Aug 2022 18:05:44 -0700 Joanne Koong wrote:
> > I think the problem is that the skb may be cloned, so a write into any
> > portion of the paged data requires a pull. If it weren't for this,
> > then we could do the write and checksumming without pulling (eg kmap
> > the page, get the csum_partial of the bytes you'll write over, do the
> > write, get the csum_partial of the bytes you just wrote, then unkmap,
> > then update skb->csum to be skb->csum - csum of the bytes you wrote
> > over + csum of the bytes you wrote). I think we would even be able to
> > provide a direct data slice to non-contiguous pages without needing
> > the additional copy to a stack buffer (eg kmap the non-contiguous
> > pages to a contiguous virtual address that we pass back to the bpf
> > program, and then when the bpf program is finished do the cleanup for
> > the mappings).
>
> The whole read/write/data concept is not a great match for packet
> parsing. Primary use for packet parsing is that you want to read
> a header and not have to deal with frags or pulling. In that case
> you should get a direct pointer or a copy on the stack, transparently.

The selftests [0] includes some examples of packet parsing using
dynptrs. You're able to get a pointer to the headers (if it's in the
head) directly, or you can use bpf_dynptr_read() to read the data from
the frag into a buffer (without needing to pull; bpf_dynptr_read()
essentially just calls bpf_skb_load_bytes()).

Does this address the use cases you have in mind?

I think the pull and unclone stuff only pertains to the cases for
writes into the frags.

[0] https://lore.kernel.org/bpf/20220726184706.954822-4-joannelkoong@xxxxxxxxx/

>
> Maybe before I go on talking nonsense I should read up on what dynptr
> is and what it's trying to achieve. Stan says its like unique_ptr in
> C++ which tells me.. nothing :)
>
> $ git grep dynptr -- Documentation/
> $
>
> Any pointers?

Ooh thanks for the reminder, adding a page for the dynptr
documentation is on my to-do list

A dynptr (also known as fat pointers in other languages) is a pointer
that stores extra metadata alongside the address it points to. In
particular, the metadata the bpf dynptr stores includes the length of
data it points to (so in the case of skb/xdp, the size of the packet),
the type of dynptr, and properties like whether it's read-only.
Dynptrs are an interface for bpf programs to dynamically access
memory, because the helper calls enforce that the accesses are safe.
For example, bpf_dynptr_read() allows reads at offsets and lengths not
statically known at compile time (in bpf_dynptr_read, the kernel uses
the metadata to check that the offset and length doesn't violate
memory bounds); without dynptrs, the verifier can't guarantee that the
offset or length of the read is safe since those values aren't
statically known at compile time, so for example you can't directly
read a dynamic number of bytes from a packet.

With regards to skb + xdp, the main use case of dynptrs are to 1)
allow dynamic-size accesses of packet data and 2) allow easier and
simpler packet parsing (for example, accessing skb->data directly
requires multiple if checks for ensuring it's within bounds of
skb->data_end in order to satisfy the verifier; with the dynptr
interface, you are able to get a direct data slice and access it
without needing the checks. The selftests 3rd patch has some
demonstrations of this).

>
> > Three ideas I'm thinking through as a possible solution:
> > 1) Enforce that the skb is always uncloned for skb-type bpf progs (we
> > currently do this just for the skb head, see bpf_unclone_prologue()),
> > but I'm not sure if the trade-off (pulling all the packet data, even
> > if it won't be used) is acceptable.
> >
> > 2) Don't support cloned skbs for bpf_dynptr_write/data and don't do
> > any pulling. If the prog wants to use bpf_dynptr_write/data, then they
> > have to pull it first
>
> I think all output skbs from TCP are cloned, so that's not gonna work.
>
> > 2) (uglier than #1 and #2) For bpf_dynptr_write()s, pull if the write
> > is to a paged area and the skb is cloned, otherwise write to the paged
> > area without pulling; if we do this, then we always have to invalidate
> > all data slices associated with the skb (even for writes to the head)
> > since at prog load time, the verifier doesn't know if the pull happens
> > or not. For bpf_dynptr_data()s, follow the same policy.
> >
> > I'm leaning towards 2. What are your thoughts?