On Tue, Dec 21, 2021 at 11:46:40AM -0800, Maciej Żenczykowski wrote: > On Tue, Dec 21, 2021 at 11:16 AM Tyler Wear (QUIC) > <quic_twear@xxxxxxxxxxx> wrote: > > > On Mon, Dec 20, 2021 at 07:18:42PM -0800, Yonghong Song wrote: > > > > On 12/20/21 12:40 PM, Tyler Wear wrote: > > > > > New bpf helper function BPF_FUNC_skb_change_dsfield "int > > > > > bpf_skb_change_dsfield(struct sk_buff *skb, u8 mask, u8 value)". > > > > > BPF_PROG_TYPE_CGROUP_SKB typed bpf_prog which currently can be > > > > > attached to the ingress and egress path. The helper is needed > > > > > because this type of bpf_prog cannot modify the skb directly. > > > > > > > > > > Used by a bpf_prog to specify DS field values on egress or ingress. > > > > > > > > Maybe you can expand a little bit here for your use case? > > > > I know DS field might help but a description of your actual use case > > > > will make adding this helper more compelling. > > > +1. More details on the use case is needed. > > > Also, having an individual helper for each particular header field is too specific. > > > > > > For egress, there is bpf_setsockopt() for IP_TOS and IPV6_TCLASS and it can be called in other cgroup hooks. e.g. > > > BPF_PROG_TYPE_SOCK_OPS during tcp ESTABLISHED event. > > > There is an example in tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c. > > > Is it enough for egress? > > > > Using bpf_setsockopt() has 2 issues: 1) it changes the userspace visible state 2) won't work with udp sendmsg cmsg > > Right, so to clarify since I've been working with Tyler on a project > of which this patch is a small component. > Note, I may be wrong here, I don't fully understand how all of this > works... but: > > ad 1) AFAIK if bpf calls bpf_setsockopt on the socket in question, > then userspace's view of the socket settings via > getsockopt(IP_TOS/IPV6_TCLASS) will also be affected - this may be > undesirable (it's technically userspace visible change in behaviour > and could, as unlikely as it is, lead to application misbehaviour). > This can be worked around via also overriding getsockopt/setsockopt > with bpf, but then you need to store the value to return to userspace > somewhere... AFAICT it all ends up being pretty ugly and very complex. CGROUP_(SET|GET)SOCKOPT is created for that. The user's value can be stored in bpf_sk_storage. > > I wouldn't be worried about needing to override each individual field, > as the only other field that looks likely to be potentially beneficial > to override would be the ipv6 flowlabel. > > ad 2) I don't think the bpf_setsockopt(IP_TOS/IPV6_TCLASS) approach > works for packets generated via udp sendmsg where cmsg is being used > to set tos. There is CGROUP_UDP[4|6]_SENDMSG. Right now, it can only change the addr. tos/tclass support could be added. > 3) I also think the bpf_setsockopt(IP_TOS/IPV6_TCLASS) might be too > late, since it would be in response to an already built packet, and > would thus presumably only take effect on the next packet, and not for > this one, no? The bpf_setsockopt can be called in bind and connect. Is it not early enough? > Technically this could be done by attaching the programs to tc egress > instead of the cgroup hook, but then it's per interface, which is > potentially the wrong granularity... Right, there is advantage to do it at higher layer, and earlier also. If the tos/tclass value can be changed early on, the correct ip[6] header can be written at the beginning instead of redoing it later and need to worry about the skb_clone_writable(), rewriting it, do iph->check..etc. > As for what is driving this? Upcoming wifi standard to allow access > points to inform client devices how to dscp mark individual flows. Interesting. How does the sending host get this dscp value from wifi and then affect the dscp of a particular flow? Is the dscp going to be stored in a bpf map for the bpf prog to use? Are you testing on TCP also? > As for the patch itself, I wonder if the return value shouldn't be > reversed, currently '1 if the DS field is set, 0 if it is not set.' > But I think returning 0 on success and an error on failure is more in > line with what other bpf helpers do? > OTOH, it does match bpf_skb_ecn_set_ce() returning 0 on failure... If adding a helper , how about making the bpf_skb_store_bytes() available to BPF_PROG_TYPE_CGROUP_SKB? Then it will be more flexible to change other fields in the future in the network and transport header.