On Mon, Nov 07, 2022 at 10:24:40AM -0800, Jakub Kicinski wrote: > IIRC the skb extensions were initially proposed as a way to handle rare > exception packets (e.g. first packet of a connection-tracked flow in OvS > offloads). Also MPTCP but that's also edge / slow path (sorry MPTCP). > > Now the usage is spreading and I have to keep fighting to keep them out > of the datacenter production kernel I co-maintain. > > So, yeah, I hate them :) To be fair, most people see CPU-terminated traffic on a DSA switch (i.e. what you see in net/dsa/tag_*.c) as slow path which needs no optimization. It's not that I condone this, but it's factually true. If it wasn't the case, then out of the drivers I maintain, a control packet wouldn't be delivered via SPI on SJA1105, and flow control wouldn't be broken on the CPU port of switches from the Ocelot family. And more importantly, software bridging between a switchdev and a non-switchdev port wouldn't be such an oversight for more than 3/4 of all switch drivers. Also, I didn't really get *why* you hate them, just that you do. Seems circular: slow => hate; hate => slow? I don't think that skb->_skb_refdst is the hallmark of clean or simple designs either, a pointer and a refcount bit squashed into a single sk_buff field that is also in a union with 2 other things, and which is reused in other network layers for purposes that have nothing to do with L3 routing. Nope, sorry, this is highly optimized design at its finest, true, but I have no interest in doing mental gymnastics in order to maintain such a thing, just because some hardware manufacturer thought that it would be a smart idea to split up device ownership in this way, and neither build a 'switch with rings' nor a 'switch with tags', but rather 'a switch with somebody else's rings'. The people who built this monstrosity should step in and maintain the software architecture that's a direct consequence of their design choices. Otherwise I'm going to opt for the simplest thing to maintain that works. It's unfair to not care about software support for your own hardware enough to study frameworks beforehand, *and* to complain about performance. > > > > The latter are more general; AFAIK they can be used between any layer > > > > and any other layer, like for example between RX and TX in the > > > > forwarding path. > > > > > > You can't be using lower-dev / upper-dev metadata across forwarding, > > > how would that ever work? > > > > What makes metadata dst's preferable to skb extensions? > > ~~~~~~~~~~~~ ~~~~~~~~~~~~~~ > > former latter > > > > I said: "The latter [aka skb extensions, not metadata dst's] are more general". > > I did not say that you can use metadata dst's across forwarding, quite > > the opposite. > > No, no, I'm asking how you'd use either. I'm questioning the entire > flow, not whether either mechanism can be used to fulfill it. Well, we are probably talking about different things. I said that skb extensions are a more general concept which *allows* you to pass metadata from the iif to the oif. Metadata dst's don't. So what is the need for metadata dst's, if skb extensions can do what those can do, and more. Not that this use case is particularly relevant to DSA OOB. Just that I think a reasonable expectation would have been to make skb extensions more performant, than to introduce a parallel mechanism. > TBH I mostly have experience on the Tx side, given that on the Rx side > there is no queuing so the entire abstraction of tag implementation > being separate is not strictly necessary. But if you find that the Rx > doesn't work, and you really want the skb extensions - then, well, > I acquiesce. And hope the Meta prod kernel never needs OOB DSA :) I would never enable this feature, either. I would love not having to see it. > > > > More importantly, what happens if a DSA switch is used together with a > > > > SRIOV-capable DSA master which already uses METADATA_HW_PORT_MUX for > > > > PF-VF communication? (if I understood the commit message of 3fcece12bc1b > > > > ("net: store port/representator id in metadata_dst") correctly) > > > > > > Let's be clear that the OOB metadata model only works if both upper and > > > lower are aware of the metadata. In fact they are pretty tightly bound. > > > So chances of a mismatch are extremely low and theorizing about them is > > > academic. > > > > Legally I'm not allowed to say too much, but let's say I've heard about > > something which makes the above not theoretical. Anyway, let's assume > > it's not a concern. > > But in that case the same vendor designs both ends, right? Yes. > So there should be no conflict between the metadata assigned for reprs > vs dsa ports. Can't say more, sorry. > > > In general, I'm not sure if pretending this is DSA is not an unnecessary > > > complication which will end up hurting both ends of the equation. > > > > This is a valid point. We've refused wacky "not DSA, not switchdev" > > hardware before: > > https://lore.kernel.org/netdev/20201125232459.378-1-lukma@xxxxxxx/ > > There's also the option of doing what I did with ocelot/felix: a common > > switch lib and 2 distinct front-ends, one switchdev and one DSA. > > Exactly. > > > Not a lot of people seem to be willing to put that effort in, though. > > The imx28 patch set was eventually abandoned. I though I'd try a > > different approach this time. Idk, maybe it's a waste of time. > > Yeah, it's a balancing act. Please explore the metadata option, I think > most people jump to the skb extension because they don't know about > metadata. If you still want skb extension after, I'll look away. Well, I guess I'm still not really convinced about metadata_dst, you're still not really convinced about skb extensions, but what we have in common is that "one switch lib, two front ends" is an alternative worth exploring as a design that's both clean and efficient? :)