Re: [PATCH v9 bpf-next 3/5] bpf: Add skb dynptrs

Joanne Koong <joannelkoong@xxxxxxxxx> · Tue, 31 Jan 2023 13:29:34 -0800

On Tue, Jan 31, 2023 at 11:50 AM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Tue, Jan 31, 2023 at 9:55 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
> >
> > On Mon, Jan 30, 2023 at 9:36 PM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Mon, Jan 30, 2023 at 04:44:12PM -0800, Joanne Koong wrote:
> > > > On Sun, Jan 29, 2023 at 3:39 PM Alexei Starovoitov
> > > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > > >
> > > > > On Fri, Jan 27, 2023 at 11:17:01AM -0800, Joanne Koong wrote:
> > > > > > Add skb dynptrs, which are dynptrs whose underlying pointer points
> > > > > > to a skb. The dynptr acts on skb data. skb dynptrs have two main
> > > > > > benefits. One is that they allow operations on sizes that are not
> > > > > > statically known at compile-time (eg variable-sized accesses).
> > > > > > Another is that parsing the packet data through dynptrs (instead of
> > > > > > through direct access of skb->data and skb->data_end) can be more
> > > > > > ergonomic and less brittle (eg does not need manual if checking for
> > > > > > being within bounds of data_end).
> > > > > >
> > > > > > For bpf prog types that don't support writes on skb data, the dynptr is
> > > > > > read-only (bpf_dynptr_write() will return an error and bpf_dynptr_data()
> > > > > > will return a data slice that is read-only where any writes to it will
> > > > > > be rejected by the verifier).
> > > > > >
> > > > > > For reads and writes through the bpf_dynptr_read() and bpf_dynptr_write()
> > > > > > interfaces, reading and writing from/to data in the head as well as from/to
> > > > > > non-linear paged buffers is supported. For data slices (through the
> > > > > > bpf_dynptr_data() interface), if the data is in a paged buffer, the user
> > > > > > must first call bpf_skb_pull_data() to pull the data into the linear
> > > > > > portion.
> > > > >
> > > > > Looks like there is an assumption in parts of this patch that
> > > > > linear part of skb is always writeable. That's not the case.
> > > > > See if (ops->gen_prologue || env->seen_direct_write) in convert_ctx_accesses().
> > > > > For TC progs it calls bpf_unclone_prologue() which adds hidden
> > > > > bpf_skb_pull_data() in the beginning of the prog to make it writeable.
> > > >
> > > > I think we can make this assumption? For writable progs (referenced in
> > > > the may_access_direct_pkt_data() function), all of them have a
> > > > gen_prologue that unclones the buffer (eg tc_cls_act, lwt_xmit, sk_skb
> > > > progs) or their linear portion is okay to write into by default (eg
> > > > xdp, sk_msg, cg_sockopt progs).
> > >
> > > but the patch was preserving seen_direct_write in some cases.
> > > I'm still confused.
> >
> > seen_direct_write is used to determine whether to actually unclone or
> > not in the program's prologue function (eg tc_cls_act_prologue() ->
> > bpf_unclone_prologue() where in bpf_unclone_prologue(), if
> > direct_write was not true, then it can skip doing the actual
> > uncloning).
> >
> > I think the part of the patch you're talking about regarding
> > seen_direct_write is this in check_helper_call():
> >
> > + if (func_id == BPF_FUNC_dynptr_data &&
> > +    dynptr_type == BPF_DYNPTR_TYPE_SKB) {
> > +   bool seen_direct_write = env->seen_direct_write;
> > +
> > +   regs[BPF_REG_0].type |= DYNPTR_TYPE_SKB;
> > +   if (!may_access_direct_pkt_data(env, NULL, BPF_WRITE))
> > +     regs[BPF_REG_0].type |= MEM_RDONLY;
> > +   else
> > +     /*
> > +     * Calling may_access_direct_pkt_data() will set
> > +     * env->seen_direct_write to true if the skb is
> > +     * writable. As an optimization, we can ignore
> > +     * setting env->seen_direct_write.
> > +     *
> > +     * env->seen_direct_write is used by skb
> > +     * programs to determine whether the skb's page
> > +     * buffers should be cloned. Since data slice
> > +     * writes would only be to the head, we can skip
> > +     * this.
> > +     */
> > +     env->seen_direct_write = seen_direct_write;
> > + }
> >
> > If the data slice for a skb dynptr is writable, then seen_direct_write
> > gets set to true (done internally in may_access_direct_pkt_data()) so
> > that the skb is actually uncloned, whereas if it's read-only, then
> > env->seen_direct_write gets reset to its original value (since the
> > may_access_direct_pkt_data() call will have set env->seen_direct_write
> > to true)
>
> I'm still confused.
> When may_access_direct_pkt_data() returns false
> it doesn't change seen_direct_write.
> When it returns true it also sets seen_direct_write=true.
> But the code above restores it to whatever value it had before.
> How is this correct?
> Are you saying that another may_access_direct_pkt_data() gets
> called somewhere in the verifier that sets seen_direct_write=true?
> But what's the harm in doing it twice or N times in all cases?

I'm confused now too. I added this in v7, judging from the comment
block, I think I added this because I thought uncloning an skb only
needs to happen if the skb's page buffers get written to (aka only if
the skb needs to be pulled), not if it's linear portion gets written
to. This is incorrect - writing to the linear part also needs to
unclone the skb. I will fix this section when I resubmit