> > > > This is wrong. > > > > CGROUP_INET_EGRESS bpf prog cannot arbitrary change packet data. I agree with this sentiment, which is why the original proposal was simply to add a helper which is only capable of modifying the tos/tclass/dscp field, and not any arbitrary bytes. (note: there already is such a helper to set the ECN congestion notification bits, so there's somewhat of a precedent) > > > > The networking stack populated the IP header at that point. > > > > If the prog changes it to something else it will be confusing other > > > > layers of stack. neigh(L2) will be wrong, etc. > > > > We can still change certain things in the packet, but not arbitrary bytes. > > > > > > > > We cannot change the DS field directly in the packet either. This part I won't agree with. In most cases there is no DSCP based routing decision, in which case it seems perfectly reasonable to change the DSCP bits here. Indeed last I checked (though this was a few years ago) the ipv4 tos routing code wasn't even capable of making sane decisions, because it looks at the bottom 4 bits of the TOS field, instead of the top 6 bits, ie. you can route on ECN bits, but you can't route on the full DSCP field. Additionally afaik the ipv6 tclass routing simply wasn't implemented. However, I last had to deal with this probably half a decade ago, on even older kernels, so perhaps the situation has changed. Additionally DSCP bits may affect transmit queue selection (for something like wifi qos / traffic prioritization across multiple transmit queues with different air-time behaviours - which can use dscp), so ideally we need dscp to be set *before* the mq qdisc / dispatch. I think this implies it needs to happen before tc (though again, I'm not too certain of the ordering here). > > > > It can only be changed by changing its value in the socket. Changing it directly in the socket has two problems: - it becomes visible to userspace which is undesirable (ie. I've run across userspace code which will set tos to A, then read it back and exit/fail/crash if it doesn't see A) - if the tos bits themselves are an input to the decision about what tos bits to actually use, then this becomes recursive and basically impossible to get right. (for example ssh sets tos to different values for interactive/bulk (ie. copy) traffic, so using application selected tos to select wire tos is perfectly reasonable) > > > Why is the DS field unchangeable, but ecn is changeable? > > > > Per spec the requirement is to modify the ds field of egress packets with DSCP value. Setting ds field on socket will not suffice here. > > Another case is where device is a middle-man and needs to modify the packets of a connected tethered client with the DSCP value, using a sock will not be able to change the packet here. > > If DS field needs to be changed differently for every packet > it's better to use TC layer for this task. > qdiscs may send packets with different DSs to different queues.