On Tue, Dec 21, 2021 at 11:16 AM Tyler Wear (QUIC) <quic_twear@xxxxxxxxxxx> wrote: > > On Mon, Dec 20, 2021 at 07:18:42PM -0800, Yonghong Song wrote: > > > On 12/20/21 12:40 PM, Tyler Wear wrote: > > > > New bpf helper function BPF_FUNC_skb_change_dsfield "int > > > > bpf_skb_change_dsfield(struct sk_buff *skb, u8 mask, u8 value)". > > > > BPF_PROG_TYPE_CGROUP_SKB typed bpf_prog which currently can be > > > > attached to the ingress and egress path. The helper is needed > > > > because this type of bpf_prog cannot modify the skb directly. > > > > > > > > Used by a bpf_prog to specify DS field values on egress or ingress. > > > > > > Maybe you can expand a little bit here for your use case? > > > I know DS field might help but a description of your actual use case > > > will make adding this helper more compelling. > > +1. More details on the use case is needed. > > Also, having an individual helper for each particular header field is too specific. > > > > For egress, there is bpf_setsockopt() for IP_TOS and IPV6_TCLASS and it can be called in other cgroup hooks. e.g. > > BPF_PROG_TYPE_SOCK_OPS during tcp ESTABLISHED event. > > There is an example in tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c. > > Is it enough for egress? > > Using bpf_setsockopt() has 2 issues: 1) it changes the userspace visible state 2) won't work with udp sendmsg cmsg Right, so to clarify since I've been working with Tyler on a project of which this patch is a small component. Note, I may be wrong here, I don't fully understand how all of this works... but: ad 1) AFAIK if bpf calls bpf_setsockopt on the socket in question, then userspace's view of the socket settings via getsockopt(IP_TOS/IPV6_TCLASS) will also be affected - this may be undesirable (it's technically userspace visible change in behaviour and could, as unlikely as it is, lead to application misbehaviour). This can be worked around via also overriding getsockopt/setsockopt with bpf, but then you need to store the value to return to userspace somewhere... AFAICT it all ends up being pretty ugly and very complex. I wouldn't be worried about needing to override each individual field, as the only other field that looks likely to be potentially beneficial to override would be the ipv6 flowlabel. ad 2) I don't think the bpf_setsockopt(IP_TOS/IPV6_TCLASS) approach works for packets generated via udp sendmsg where cmsg is being used to set tos. 3) I also think the bpf_setsockopt(IP_TOS/IPV6_TCLASS) might be too late, since it would be in response to an already built packet, and would thus presumably only take effect on the next packet, and not for this one, no? Technically this could be done by attaching the programs to tc egress instead of the cgroup hook, but then it's per interface, which is potentially the wrong granularity... As for what is driving this? Upcoming wifi standard to allow access points to inform client devices how to dscp mark individual flows. As for the patch itself, I wonder if the return value shouldn't be reversed, currently '1 if the DS field is set, 0 if it is not set.' But I think returning 0 on success and an error on failure is more in line with what other bpf helpers do? OTOH, it does match bpf_skb_ecn_set_ce() returning 0 on failure... - Maciej