Jakub Kicinski <kuba@xxxxxxxxxx> writes: > On Wed, 10 Feb 2021 11:53:53 +0100 Toke Høiland-Jørgensen wrote: >> >> I am a bit confused now. Did you mean validation tests of those XDP >> >> flags, which I am working on or some other validation tests? >> >> What should these tests verify? Can you please elaborate more on the >> >> topic, please - just a few sentences how are you see it? >> > >> > Conformance tests can be written for all features, whether they have >> > an explicit capability in the uAPI or not. But for those that do IMO >> > the tests should be required. >> > >> > Let me give you an example. This set adds a bit that says Intel NICs >> > can do XDP_TX and XDP_REDIRECT, yet we both know of the Tx queue >> > shenanigans. So can i40e do XDP_REDIRECT or can it not? >> > >> > If we have exhaustive conformance tests we can confidently answer that >> > question. And the answer may not be "yes" or "no", it may actually be >> > "we need more options because many implementations fall in between". >> > >> > I think readable (IOW not written in some insane DSL) tests can also >> > be useful for users who want to check which features their program / >> > deployment will require. >> >> While I do agree that that kind of conformance test would be great, I >> don't think it has to hold up this series (the perfect being the enemy >> of the good, and all that). We have a real problem today that userspace >> can't tell if a given driver implements, say, XDP_REDIRECT, and so >> people try to use it and spend days wondering which black hole their >> packets disappear into. And for things like container migration we need >> to be able to predict whether a given host supports a feature *before* >> we start the migration and try to use it. > > Unless you have a strong definition of what XDP_REDIRECT means the flag > itself is not worth much. We're not talking about normal ethtool feature > flags which are primarily stack-driven, XDP is implemented mostly by > the driver, each vendor can do their own thing. Maybe I've seen one > vendor incompatibility too many at my day job to hope for the best... I'm totally on board with documenting what a feature means. E.g., for XDP_REDIRECT, whether it's acceptable to fail the redirect in some situations even when it's active, or if there should always be a slow-path fallback. But I disagree that the flag is worthless without it. People are running into real issues with trying to run XDP_REDIRECT programs on a driver that doesn't support it at all, and it's incredibly confusing. The latest example popped up literally yesterday: https://lore.kernel.org/xdp-newbies/CAM-scZPPeu44FeCPGO=Qz=03CrhhfB1GdJ8FNEpPqP_G27c6mQ@xxxxxxxxxxxxxx/ >> I view the feature flags as a list of features *implemented* by the >> driver. Which should be pretty static in a given kernel, but may be >> different than the features currently *enabled* on a given system (due >> to, e.g., the TX queue stuff). > > Hm, maybe I'm not being clear enough. The way XDP_REDIRECT (your > example) is implemented across drivers differs in a meaningful ways. > Hence the need for conformance testing. We don't have a golden SW > standard to fall back on, like we do with HW offloads. I'm not disagreeing that we need to harmonise what "implementing a feature" means. Maybe I'm just not sure what you mean by "conformance testing"? What would that look like, specifically? A script in selftest that sets up a redirect between two interfaces that we tell people to run? Or what? How would you catch, say, that issue where if a machine has more CPUs than the NIC has TXQs things start falling apart? > Also IDK why those tests are considered such a huge ask. As I said most > vendors probably already have them, and so I'd guess do good distros. > So let's work together. I guess what I'm afraid of is that this will end up delaying or stalling a fix for a long-standing issue (which is what I consider this series as shown by the example above). Maybe you can alleviate that by expanding a bit on what you mean? -Toke