> Enter the fwctl proposal [1]. From the CXL subsystem perspective it > looks like a long-term solution to the problem of managing expectations > between hardware vendors and mainline subsystems. It disclaims support > for the fast-path (data-plane) and is targeted at the long tail of > slow-path (config/debug plane) device-specific operations that are often > uninteresting to mainline. It sets expectations that the device must > advertise the effect of all commands so that the kernel can deploy > reasonable Kernel Lockdown policy, or otherwise require CAP_SYS_RAWIO > for commands that may affect user-data. It sets common expectations for > device designers, distribution maintainers, and kernel developers. It is > complimentary to the Linux-command path for operations that need deeper > kernel coordination. I'm reasonably on board with the basic justification for the fwctl proposal (and it may solve some long term challenges for us), but I'm concerned by ABI guarantees. In particularly what happens when we get decision wrong and expose something via fwctl that we later have more general kernel support for. This was triggered by Dave Jiang's (perhaps unintended) use of Patrol Scrub as a test case for the CXL FWCTL RFC. https://lore.kernel.org/linux-cxl/20240729130528.0000139b@xxxxxxxxxx/T/#t I'm using this specific example here, but I think it is a more general question. By exposing the Get / Set feature controls, a lot of effective user space ABI is added (some of it setting a taint). Scrub control is an interesting example, because there is an active proposal to extend EDAC to cover this and similar RAS related control features, something we are going to be discussing at LPC. https://lore.kernel.org/linux-cxl/20240726160556.2079-1-shiju.jose@xxxxxxxxxx/ One of the key bits of feedback we've had on that series is that it should be integrated with EDAC. Part of the reason being need to get appropriate RAS expert review. Something fwctl won't naturally get. If we expose that particular Feature via Set Feature we may run into future problems. It is probably possible to make the driver stateless so any interference from a userspace program using fwctl is not fatal - in this case userspace code should probably be safe to state changes anyway. We know about this clash today, so could easily block fwctl from exposing this feature, but it is illustrative of a wider problem. We will get some decisions about what should be exposed via fwctl wrong in the long term, even if they are correct at time of initial decision. So how do we cope with that? 1) Make no guarantees on ABI for taint causing operations. So we can block this FWCTL in a kernel if EDAC / ras control is in place for the same feature. I'm fine with this but it's not obviously a correct thing to do! 2) Allow the footgun. Keep the fwctl interface and harden the other kernel support against state changes that result. If userspace code breaks, then tough luck. (Another form of ABI break, perhaps comprehended by existing proposed FWCTL rules). 3) We are stuck for ever with not supporting anything via other interfaces that would break if fwctl was in use. Ouch. Note that I think this only matters for the Set path as Get side shouldn't have side effects and is fine to expose without synchronization with a clear statement that values read are a snapshot only. So before I'd be happy with fwctl in CXL, I'd want a very clear policy decision on this that isn't going leave us in a mess long term. We could say it can only be used for features we have 'opted' in + vendor defined features, but I'm not sure that helps. If a vendor defines a feature for generation A, and does what we want them to by proposing a spec addition they use in generation B, we would want a path to single upstream interface for both generations. So I don't think restricting this to particular classes of command helps us. Jonathan