Re: [PATCH v9 bpf-next 3/5] bpf: Add skb dynptrs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/30/23 5:04 PM, Andrii Nakryiko wrote:
On Mon, Jan 30, 2023 at 2:31 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:

On Mon, Jan 30, 2023 at 02:04:08PM -0800, Martin KaFai Lau wrote:
On 1/27/23 11:17 AM, Joanne Koong wrote:
@@ -8243,6 +8316,28 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
             mark_reg_known_zero(env, regs, BPF_REG_0);
             regs[BPF_REG_0].type = PTR_TO_MEM | ret_flag;
             regs[BPF_REG_0].mem_size = meta.mem_size;
+           if (func_id == BPF_FUNC_dynptr_data &&
+               dynptr_type == BPF_DYNPTR_TYPE_SKB) {
+                   bool seen_direct_write = env->seen_direct_write;
+
+                   regs[BPF_REG_0].type |= DYNPTR_TYPE_SKB;
+                   if (!may_access_direct_pkt_data(env, NULL, BPF_WRITE))
+                           regs[BPF_REG_0].type |= MEM_RDONLY;
+                   else
+                           /*
+                            * Calling may_access_direct_pkt_data() will set
+                            * env->seen_direct_write to true if the skb is
+                            * writable. As an optimization, we can ignore
+                            * setting env->seen_direct_write.
+                            *
+                            * env->seen_direct_write is used by skb
+                            * programs to determine whether the skb's page
+                            * buffers should be cloned. Since data slice
+                            * writes would only be to the head, we can skip
+                            * this.
+                            */
+                           env->seen_direct_write = seen_direct_write;
+           }

[ ... ]

@@ -9263,17 +9361,26 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
                             return ret;
                     break;
             case KF_ARG_PTR_TO_DYNPTR:
+           {
+                   enum bpf_arg_type dynptr_arg_type = ARG_PTR_TO_DYNPTR;
+
                     if (reg->type != PTR_TO_STACK &&
                         reg->type != CONST_PTR_TO_DYNPTR) {
                             verbose(env, "arg#%d expected pointer to stack or dynptr_ptr\n", i);
                             return -EINVAL;
                     }
-                   ret = process_dynptr_func(env, regno, insn_idx,
-                                             ARG_PTR_TO_DYNPTR | MEM_RDONLY);
+                   if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_skb])
+                           dynptr_arg_type |= MEM_UNINIT | DYNPTR_TYPE_SKB;
+                   else
+                           dynptr_arg_type |= MEM_RDONLY;
+
+                   ret = process_dynptr_func(env, regno, insn_idx, dynptr_arg_type,
+                                             meta->func_id);
                     if (ret < 0)
                             return ret;
                     break;
+           }
             case KF_ARG_PTR_TO_LIST_HEAD:
                     if (reg->type != PTR_TO_MAP_VALUE &&
                         reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
@@ -15857,6 +15964,14 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
                desc->func_id == special_kfunc_list[KF_bpf_rdonly_cast]) {
             insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_1);
             *cnt = 1;
+   } else if (desc->func_id == special_kfunc_list[KF_bpf_dynptr_from_skb]) {
+           bool is_rdonly = !may_access_direct_pkt_data(env, NULL, BPF_WRITE);

Does it need to restore the env->seen_direct_write here also?

It seems this 'seen_direct_write' saving/restoring is needed now because
'may_access_direct_pkt_data(BPF_WRITE)' is not only called when it is
actually writing the packet. Some refactoring can help to avoid issue like
this.

While at 'seen_direct_write', Alexei has also pointed out that the verifier
needs to track whether the (packet) 'slice' returned by bpf_dynptr_data()
has been written. It should be tracked in 'seen_direct_write'. Take a look
at how reg_is_pkt_pointer() and may_access_direct_pkt_data() are done in
check_mem_access(). iirc, this reg_is_pkt_pointer() part got loss somewhere
in v5 (or v4?) when bpf_dynptr_data() was changed to return register typed
PTR_TO_MEM instead of PTR_TO_PACKET.

btw tc progs are using gen_prologue() approach because data/data_end are not kfuncs
(nothing is being called by the bpf prog).
In this case we don't need to repeat this approach. If so we don't need to
set seen_direct_write.
Instead bpf_dynptr_data() can call bpf_skb_pull_data() directly.
And technically we don't need to limit it to skb head. It can handle any off/len.
It will work for skb, but there is no equivalent for xdp_pull_data().
I don't think we can implement xdp_pull_data in all drivers.
That's massive amount of work, but we need to be consistent if we want
dynptr to wrap both skb and xdp.
We can say dynptr_data is for head only, but we've seen bugs where people
had to switch from data/data_end to load_bytes.

Also bpf_skb_pull_data is quite heavy. For progs that only want to parse
the packet calling that in bpf_dynptr_data is a heavy hammer.

It feels that we need to go back to skb_header_pointer-like discussion.
Something like:
bpf_dynptr_slice(const struct bpf_dynptr *ptr, u32 offset, u32 len, void *buffer)
Whether buffer is a part of dynptr or program provided is tbd.

making it hidden within dynptr would make this approach unreliable
(memory allocations, which can fail, etc). But if we ask users to pass
it directly, then it should be relatively easy to use in practice with
some pre-allocated per-CPU buffer:


struct {
   __int(type, BPF_MAP_TYPE_PERCPU_ARRAY);
   __int(max_entries, 1);
   __type(key, int);
   __type(value, char[4096]);
} scratch SEC(".maps");


...


struct dyn_ptr *dp = bpf_dynptr_from_skb(...).
void *p, *buf;
int zero = 0;

buf = bpf_map_lookup_elem(&scratch, &zero);
if (!buf) return 0; /* can't happen */

p = bpf_dynptr_slice(dp, off, 16, buf);
if (p == NULL) {
    /* out of range */
} else {
    /* work with p directly */
}

/* if we wrote something to p and it was copied to buffer, write it back */
if (p == buf) {
     bpf_dynptr_write(dp, buf, 16);
}


We'll just need to teach verifier to make sure that buf is at least 16
byte long.

A fifth __sz arg may do:
bpf_dynptr_slice(const struct bpf_dynptr *ptr, u32 offset, u32 len, void *buffer, u32 buffer__sz);

The bpf prog usually has buffer in the stack for the common small header parsing.

One side note is the bpf_dynptr_slice() still needs to check if the skb is cloned or not even the off/len is within the head range.

But I wonder if for simple cases when users are mostly sure that they
are going to access only header data directly we can have an option
for bpf_dynptr_from_skb() to specify what should be the behavior for
bpf_dynptr_slice():

  - either return NULL for anything that crosses into frags (no
surprising perf penalty, but surprising NULLs);
  - do bpf_skb_pull_data() if bpf_dynptr_data() needs to point to data
beyond header (potential perf penalty, but on NULLs, if off+len is
within packet).

And then bpf_dynptr_from_skb() can accept a flag specifying this
behavior and store it somewhere in struct bpf_dynptr.

xdp does not have the bpf_skb_pull_data() equivalent, so xdp prog will still need the write back handling.




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux