Re: [PATCH bpf-next v1 1/3] bpf: Add skb dynptrs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 1, 2022 at 9:53 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
>
> On Mon, Aug 1, 2022 at 8:51 PM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > On Mon, Aug 1, 2022 at 5:56 PM Martin KaFai Lau <kafai@xxxxxx> wrote:
> > >
> > > On Mon, Aug 01, 2022 at 04:23:16PM -0700, Martin KaFai Lau wrote:
> > > > On Mon, Aug 01, 2022 at 03:58:41PM -0700, Andrii Nakryiko wrote:
> > > > > On Mon, Aug 1, 2022 at 3:33 PM Martin KaFai Lau <kafai@xxxxxx> wrote:
> > > > > >
> > > > > > On Mon, Aug 01, 2022 at 02:16:23PM -0700, Joanne Koong wrote:
> > > > > > > On Mon, Aug 1, 2022 at 12:38 PM Martin KaFai Lau <kafai@xxxxxx> wrote:
> > > > > > > >
> > > > > > > > On Mon, Aug 01, 2022 at 10:52:14AM -0700, Joanne Koong wrote:
> > > > > > > > > > Since we are on bpf_dynptr_write, what is the reason
> > > > > > > > > > on limiting it to the skb_headlen() ?  Not implying one
> > > > > > > > > > way is better than another.  would like to undertand the reason
> > > > > > > > > > behind it since it is not clear in the commit message.
> > > > > > > > > For bpf_dynptr_write, if we don't limit it to skb_headlen() then there
> > > > > > > > > may be writes that pull the skb, so any existing data slices to the
> > > > > > > > > skb must be invalidated. However, in the verifier we can't detect when
> > > > > > > > > the data slice should be invalidated vs. when it shouldn't (eg
> > > > > > > > > detecting when a write goes into the paged area vs when the write is
> > > > > > > > > only in the head). If the prog wants to write into the paged area, I
> > > > > > > > > think the only way it can work is if it pulls the data first with
> > > > > > > > > bpf_skb_pull_data before calling bpf_dynptr_write. I will add this to
> > > > > > > > > the commit message in v2
> > > > > > > > Note that current verifier unconditionally invalidates PTR_TO_PACKET
> > > > > > > > after bpf_skb_store_bytes().  Potentially the same could be done for
> > > > > > > > other new helper like bpf_dynptr_write().  I think this bpf_dynptr_write()
> > > > > > > > behavior cannot be changed later, so want to raise this possibility here
> > > > > > > > just in case it wasn't considered before.
> > > > > > >
> > > > > > > Thanks for raising this possibility. To me, it seems more intuitive
> > > > > > > from the user standpoint to have bpf_dynptr_write() on a paged area
> > > > > > > fail (even if bpf_dynptr_read() on that same offset succeeds) than to
> > > > > > > have bpf_dynptr_write() always invalidate all dynptr slices related to
> > > > > > > that skb. I think most writes will be to the data in the head area,
> > > > > > > which seems unfortunate that bpf_dynptr_writes to the head area would
> > > > > > > invalidate the dynptr slices regardless.
> > > > > > >
> > > > > > > What are your thoughts? Do you think you prefer having
> > > > > > > bpf_dynptr_write() always work regardless of where the data is? If so,
> > > > > > > I'm happy to make that change for v2 :)
> > > > > > Yeah, it sounds like an optimization to avoid unnecessarily
> > > > > > invalidating the sliced data.
> > > > > >
> > > > > > To be honest, I am not sure how often the dynptr_data()+dynptr_write() combo will
> > > > > > be used considering there is usually a pkt read before a pkt write in
> > > > > > the pkt modification use case.  If I got that far to have a sliced data pointer
> > > > > > to satisfy what I need for reading,  I would try to avoid making extra call
> > > > > > to dyptr_write() to modify it.
> > > > > >
> > > > > > I would prefer user can have similar expectation (no need to worry pkt layout)
> > > > > > between dynptr_read() and dynptr_write(), and also has similar experience to
> > > > > > the bpf_skb_load_bytes() and bpf_skb_store_bytes().  Otherwise, it is just
> > > > > > unnecessary rules for user to remember while there is no clear benefit on
> > > > > > the chance of this optimization.
> > > > > >
> > > > >
> > > > > Are you saying that bpf_dynptr_read() shouldn't read from non-linear
> > > > > part of skb (and thus match more restrictive bpf_dynptr_write), or are
> > > > > you saying you'd rather have bpf_dynptr_write() write into non-linear
> > > > > part but invalidate bpf_dynptr_data() pointers?
> > > > The latter.  Read and write without worrying about the skb layout.
> > > >
> > > > Also, if the prog needs to call a helper to write, it knows the bytes are
> > > > not in the data pointer.  Then it needs to bpf_skb_pull_data() before
> > > > it can call write.  However, after bpf_skb_pull_data(), why the prog
> > > > needs to call the write helper instead of directly getting a new
> > > > data pointer and write to it?  If the prog needs to write many many
> > > > bytes, a write helper may then help.
> > > After another thought, other than the non-linear handling,
> > > bpf_skb_store_bytes() / dynptr_write() is more useful in
> > > the 'BPF_F_RECOMPUTE_CSUM | BPF_F_INVALIDATE_HASH' flags.
> > >
> > > That said,  my preference is still to have the same expectation on
> > > non-linear data for both dynptr_read() and dynptr_write().  Considering
> > > the user can fall back to use bpf_skb_load_bytes() and
> > > bpf_skb_store_bytes(), I am fine with the current patch also.
> > >
> >
> > Honestly, I don't have any specific preference, because I don't have
> > much specific experience writing networking BPF :)
> >
> > But considering Jakub's point about trying to unify skb/xdp dynptr,
> > while I can see how we might have symmetrical dynptr_{read,write}()
> > for skb case (because you can pull skb), I believe this is not
> > possible with XDP (e.g., multi-buffer one), so bpf_dynptr_write()
> > would always be more limited for XDP case.
> >
> > Or maybe it is possible for XDP and I'm totally wrong here? I'm happy
> > to be educated about this!
>
> My understanding is that it's possible for XDP because the data in the
> frags are mapped [eg we can use skb_frag_address() to get the address
> and then copy into it with direct memcpys [0]] whereas skb frags are
> unmapped (eg access into the frag requires kmapping [1]).
>
> Maybe one solution is to add a function that does the mapping + write
> to a skb frag without pulling it to the head. This would allow
> bpf_dynptr_write to all data without needing to invalidate any dynptr
> slices. But I don't know whether this is compatible with recomputing
> the checksum or not, maybe the written data needs to be mapped (and
> hence part of head) so that it can be used to compute the checksum [2]
> - I'll read up some more on the checksumming code.
>
> I like your point Martin that if people are using bpf_dynptr_write,
> then they probably aren't using data slices much anyways so it
> wouldn't be too inconvenient that their slices are invalidated (eg if
> they are using bpf_dynptr_write it's to write into the skb frag, at
> which point they would need to call pull before bpf_dynptr_write,
> which would lead to same scenario where the data slices are
> invalidated). My main concern was that slices would be invalidated for
> bpf_dynptr_writes on data in the head area, but you're right that that
> shouldn't be too likely since they'd just be using a direct data slice
> access instead to read/write. I'll change it so that bpf_dynptr_write
> always succeeds and it'll always invalidate the data slices for v2.

On second thought, for v2 I plan to combine xdp and skb to 1 generic
function ("bpf_dynptr_from_packet") per Jakub's suggestion. I think in
that case then, for consistency, it'd be lbetter if we don't
invalidate the slices since it would be confusing if bpf_dynptr_write
invalidates slices for skb-type progs but not xdp-type ones. I think
bpf_dynptr_write returning an error for writes into the frag area for
skb type progs but not xdp type progs is less jarring than dynptr
slices being invalidated for skb type progs but not xdp type progs.

>
> [0] https://elixir.bootlin.com/linux/v5.19/source/net/core/filter.c#L3846
> [1] https://elixir.bootlin.com/linux/v5.19/source/net/core/skbuff.c#L2367
> [2] https://elixir.bootlin.com/linux/v5.19/source/include/linux/skbuff.h#L3839
>
> >
> > > >
> > > > >
> > > > > I guess I agree about consistency and that it seems like in practice
> > > > > you'd use bpf_dynptr_data() to work with headers and stuff like that
> > > > > at known locations, and then if you need to modify the rest of payload
> > > > > you'd do either bpf_skb_load_bytes()/bpf_skb_store_bytes() or
> > > > > bpf_dynptr_read()/bpf_dynptr_write() which would invalidate
> > > > > bpf_dynptr_data() pointers (but that would be ok by that time).
> > > > imo, read, write and then go back to read is less common.
> > > > writing bytes without first reading them is also less common.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux