Re: [Patch bpf-next v3 4/5] skmsg: use skb ext instead of TCP_SKB_CB

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 15, 2021 at 5:50 PM John Fastabend <john.fastabend@xxxxxxxxx> wrote:
>
> Cong Wang wrote:
> > On Mon, Feb 15, 2021 at 4:54 PM John Fastabend <john.fastabend@xxxxxxxxx> wrote:
> > >
> > > Cong Wang wrote:
> > > > On Mon, Feb 15, 2021 at 3:57 PM John Fastabend <john.fastabend@xxxxxxxxx> wrote:
> > > > >
> > > > > For TCP case we can continue to use CB and not pay the price. For UDP
> > > > > and AF_UNIX we can do the extra alloc.
> > > >
> > > > I see your point, but specializing TCP case does not give much benefit
> > > > here, the skmsg code would have to check skb->protocol etc. to decide
> > > > whether to use TCP_SKB_CB() or skb_ext:
> > > >
> > > > if (skb->protocol == ...)
> > > >   TCP_SKB_CB(skb) = ...;
> > > > else
> > > >   ext = skb_ext_find(skb);
> > > >
> > > > which looks ugly to me. And I doubt skb->protocol alone is sufficient to
> > > > distinguish TCP, so we may end up having more checks above.
> > > >
> > > > So do you really want to trade code readability with an extra alloc?
> > >
> > > Above is ugly. So I look at where the patch replaces things,
> > >
> > > sk_psock_tls_strp_read(), this is TLS specific read hook so can't really
> > > work in generic case anyways.
> > >
> > > sk_psock_strp_read(), will you have UDP, AF_UNIX stream parsers? Do these
> > > even work outside TCP cases.
> > >
> > > For these ones: sk_psock_verdict_apply(), sk_psock_verdict_recv(),
> > > sk_psock_backlog(), can't we just do some refactoring around their
> > > hook points so we know the context. For example sk_psock_tls_verdict_apply
> > > is calling sk_psock_skb_redirect(). Why not have a sk_psock_unix_redirect()
> > > and a sk_psock_udp_redirect(). There are likely some optimizations we can
> > > deploy this way. We've already don this for tls and sk_msg types for example.
> > >
> > > Then the helpers will know their types by program type, just use the right
> > > variants.
> > >
> > > So not suggestiong if/else the checks so much as having per type hooks.
> > >
> >
> > Hmm, but sk_psock_backlog() is still the only one that handles all three
> > above cases, right? It uses TCP_SKB_CB() too and more importantly it
> > is also why we can't use a per-cpu struct here (see bpf_redirect_info).
>
> Right, but the workqueue is created at init time where we will know the
> socket type based on the program/map types so can build the redirect
> backlog queue there based on the type needed. I also have a patch in

Hmm? How could a socket type match the skb type when we redirect
across-protocol?

In my use case, I want to redirect an AF_UNIX skb to a UDP socket,
clearly checking the UDP socket workqueue can't find out it is an
AF_UNIX skb. It has to be a per-skb check.

> mind that would do more specific TCP things in that code anyways. I
> can flush it out this week if anyone cares. The idea is we are wasting
> lots of cycles using skb_send_sock_locked when we can just inject
> the packet directlyy into the tcp stack.

Actually I did try the same, it clearly doesn't work for cross-protocol.

Anyway, please let me know what your suggestion for skb ext here?

It looks like we either have some ugly packet type checks, or just
use the skb ext. I don't see any other way yet, I also explored the
struct sk_buff again and still can not find anything we can reuse.
(_skb_refdst can only be reused very briefly with
tcp_skb_tsorted_save().)

Therefore, I believe using skb ext is still the best solution here.

>
> Also on the original patch here, we can't just kfree_skb() on alloc
> errors because that will look like a data drop. Errors need to be
> handled gracefully without dropping data. At least in the TCP case,
> but probably also in UDP and AF_UNIX cases as well. Original code
> was pretty loose in this regard, but it caused users to write bug
> reports and then I've been fixing most of them. If you see more
> cases let me know.

What's your suggestion here? Return -EAGAIN? But it requires
the caller put it in a loop to be graceful, but we can't do it in, for
example, sk_psock_tls_strp_read().

Thanks.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux