On 1/30/23 9:30 PM, Alexei Starovoitov wrote:
Also bpf_skb_pull_data is quite heavy. For progs that only want to parse
the packet calling that in bpf_dynptr_data is a heavy hammer.
It feels that we need to go back to skb_header_pointer-like discussion.
Something like:
bpf_dynptr_slice(const struct bpf_dynptr *ptr, u32 offset, u32 len, void *buffer)
Whether buffer is a part of dynptr or program provided is tbd.
making it hidden within dynptr would make this approach unreliable
(memory allocations, which can fail, etc). But if we ask users to pass
it directly, then it should be relatively easy to use in practice with
some pre-allocated per-CPU buffer:
bpf_skb_pull_data() is even more unreliable, since it's a bigger allocation.
I like preallocated approach more, so we're in agreement here.
struct {
__int(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__int(max_entries, 1);
__type(key, int);
__type(value, char[4096]);
} scratch SEC(".maps");
...
struct dyn_ptr *dp = bpf_dynptr_from_skb(...).
void *p, *buf;
int zero = 0;
buf = bpf_map_lookup_elem(&scratch, &zero);
if (!buf) return 0; /* can't happen */
p = bpf_dynptr_slice(dp, off, 16, buf);
if (p == NULL) {
/* out of range */
} else {
/* work with p directly */
}
/* if we wrote something to p and it was copied to buffer, write it back */
if (p == buf) {
bpf_dynptr_write(dp, buf, 16);
}
We'll just need to teach verifier to make sure that buf is at least 16
byte long.
A fifth __sz arg may do:
bpf_dynptr_slice(const struct bpf_dynptr *ptr, u32 offset, u32 len, void
*buffer, u32 buffer__sz);
We'll need to make sure that buffer__sz is >= len (or preferably not
require extra size at all). We can check that at runtime, of course,
but rejecting too small buffer at verification time would be a better
experience.
I don't follow. Why two equivalent 'len' args ?
Just to allow 'len' to be a variable instead of constant ?
It's unusual for the verifier to have 'len' before 'buffer',
but this is fixable.
Agree. One const scalar 'len' should be enough. Buffer should have the same size
as the requesting slice.
How about adding 'rd_only vs rdwr' flag ?
Then MEM_RDONLY for ret value of bpf_dynptr_slice can be set by the verifier
and in run-time bpf_dynptr_slice() wouldn't need to check for skb->cloned.
if (rd_only) return skb_header_pointer()
if (rdwr) bpf_try_make_writable(); return skb->data + off;
and final bpf_dynptr_write() is not needed.
But that doesn't work for xdp, since there is no pull.
It's not clear how to deal with BPF_F_RECOMPUTE_CSUM though.
Expose __skb_postpull_rcsum/__skb_postpush_rcsum as kfuncs?
But that defeats Andrii's goal to use dynptr as a generic wrapper.
skb is quite special.
Maybe something like:
void *bpf_dynptr_slice(const struct bpf_dynptr *ptr, u32 offset, u32 len,
void *buffer, u32 buffer__sz)
{
if (skb_cloned()) {
skb_copy_bits(skb, offset, buffer, len);
return buffer;
}
return skb_header_pointer(...);
}
When prog is just parsing the packet it doesn't need to finalize with bpf_dynptr_write.
The prog can always write into the pointer followed by if (p == buf) bpf_dynptr_write.
No need for rdonly flag, but extra copy is there in case of cloned which
could have been avoided with extra rd_only flag.
In case of xdp it will be:
void *bpf_dynptr_slice(const struct bpf_dynptr *ptr, u32 offset, u32 len,
void *buffer, u32 buffer__sz)
{
ptr = bpf_xdp_pointer(xdp, offset, len);
if (ptr)
return ptr;
bpf_xdp_copy_buf(xdp, offset, buffer, len, false); /* copy into buf */
return buffer;
}
bpf_dynptr_write will use bpf_xdp_copy_buf(,true); /* copy into xdp */
My preference would be making bpf_dynptr_slice() work similarly for skb and xdp,
so above bpf_dynptr_slice() skb and xdp logic looks good.
Regarding, MEM_RDONLY, it probably is not relevant to xdp. For skb, not sure how
often is the 'skb_cloned() && !skb_clone_writable()'. May be it can be left for
later optimization?
Regarding BPF_F_RECOMPUTE_CSUM, I wonder if bpf_csum_diff() is enough to come up
the csum. Then the missing kfunc is to update the skb->csum. Not sure how the
csum logic will look like in xdp, probably getting csum from the xdp-hint,
calculate csum_diff and then set it to the to-be-created skb. All this is likely
a kfunc also, eg. a kfunc to directly allocate skb during the XDP_PASS case. The
bpf prog will have to be written differently if it needs to deal with the csum
but the header parsing part could at least be shared.