Re: [PATCH] Bpf Helper Function BPF_FUNC_skb_change_dsfield

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 21, 2021 at 02:13:04PM -0800, Maciej Żenczykowski wrote:
> > > ad 1) AFAIK if bpf calls bpf_setsockopt on the socket in question,
> > > then userspace's view of the socket settings via
> > > getsockopt(IP_TOS/IPV6_TCLASS) will also be affected - this may be
> > > undesirable (it's technically userspace visible change in behaviour
> > > and could, as unlikely as it is, lead to application misbehaviour).
> > > This can be worked around via also overriding getsockopt/setsockopt
> > > with bpf, but then you need to store the value to return to userspace
> > > somewhere... AFAICT it all ends up being pretty ugly and very complex.
> > CGROUP_(SET|GET)SOCKOPT is created for that.
> > The user's value can be stored in bpf_sk_storage.
> 
> Yes, it can be done, it's very complex to do so.
> 
> The policy can change during run time (indeed that's probably a
> relatively likely situation,
> network gear notices a new high bandwidth connection and provides out
> of band feedback
> that it should be using a different dscp code point - we probably
> don't want the full policy to
> be present in the device because it might be a huge number of entries,
> with wildcards).
ic. got it. The value is very dynamic, so changing tos/tclass of a sock
in a more static hook like bind won't work well.

Details like this should have been in the commit message.

> > > I wouldn't be worried about needing to override each individual field,
> > > as the only other field that looks likely to be potentially beneficial
> > > to override would be the ipv6 flowlabel.
> > >
> > > ad 2) I don't think the bpf_setsockopt(IP_TOS/IPV6_TCLASS) approach
> > > works for packets generated via udp sendmsg where cmsg is being used
> > > to set tos.
> > There is CGROUP_UDP[4|6]_SENDMSG.  Right now, it can only change the addr.
> > tos/tclass support could be added.
> 
> It could, that doesn't seem easier to do than this approach though.
> 

[ ... ]

> > > Technically this could be done by attaching the programs to tc egress
> > > instead of the cgroup hook, but then it's per interface, which is
> > > potentially the wrong granularity...
> 
> > Right, there is advantage to do it at higher layer,
> > and earlier also.
> >
> > If the tos/tclass value can be changed early on, the correct
> > ip[6] header can be written at the beginning instead
> > of redoing it later and need to worry about the skb_clone_writable(),
> > rewriting it, do iph->check..etc.
> 
> I would indeed like it if we could decouple what userspace wants,
> from what the kernel/network actually uses.  There would need to be
> some sort of bpf hook,
> that takes a socket/flow and returns the tos/dscp to actually use
> (based on 5-tuple and other information).
> 
> But again, this would be *much* more complex.
In terms of the bpf prog doing the dscp-logic, it should be
pretty much the same.  Getting the 5-tuple, lookup from a map
and return the dscp value.

> > > As for what is driving this?  Upcoming wifi standard to allow access
> > > points to inform client devices how to dscp mark individual flows.
> > Interesting.
> >
> > How does the sending host get this dscp value from wifi
> > and then affect the dscp of a particular flow?  Is the dscp
> > going to be stored in a bpf map for the bpf prog to use?
> 
> It gets it out of band via some wifi signaling mechanism.
> Tyler probably knows the details.
> 
> Storing flow match information to dscp mapping in a bpf map is indeed the plan.
> 
> > Are you testing on TCP also?
> >
> > > As for the patch itself, I wonder if the return value shouldn't be
> > > reversed, currently '1 if the DS field is set, 0 if it is not set.'
> > > But I think returning 0 on success and an error on failure is more in
> > > line with what other bpf helpers do?
> > > OTOH, it does match bpf_skb_ecn_set_ce() returning 0 on failure...
> 
> > If adding a helper , how about making the bpf_skb_store_bytes()
> > available to BPF_PROG_TYPE_CGROUP_SKB?  Then it will be
> > more flexible to change other fields in the future in
> > the network and transport header.
> 
> I assume there's some reason why it's not available?
Not that I am aware of.  There was just no use case for it.
The static case is handled in socket bind/creation time.
The more dynamic case is handled in the udp_sendmsg which so
far the use case is changing the address only.

I don't mind adding bpf_skb_store_bytes() which could be useful
for other header's fields.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux