On Mon, Jul 29, 2024 at 03:37:10PM -0700, Dan Williams wrote: > Jason Gunthorpe wrote: > [..] > > > We could say it can only be used for features we have 'opted' in + > > > vendor defined features, but I'm not sure that helps. If a vendor > > > defines a feature for generation A, and does what we want them to by > > > proposing a spec addition they use in generation B, we would want a > > > path to single upstream interface for both generations. So I don't > > > think restricting this to particular classes of command helps us. > > > > My expectation for fwctl was that it would own things that are > > reasonably sharable by the kernel and userspace. > > > > As an example, instead of a turning on a feature dynamically at run > > time, you'd want to instead tell the FW that on next reboot that > > feature will be forced on. > > > > Another take would be things that are clearly contained to fwctl > > multi-instance features where fwctl gets its own private thing that > > cannot disturb the kernel. > > > > I'm really not familiar with cxl to give any comment here - but > > dynamically control the single global scrubber unit seems like a poor > > fit to me. > > Right, one of the mistakes from NVDIMM that was corrected for CXL was to > explicitly remove the passthrough capability for global state machine > controls like scrubbing. > > Many of the "Immediate Configuration Change" CXL commands fall into this > bucket of things that may want to have a kernel-managed global view > rather than let userspace and the kernel get into fights about the > configuration. So, I think it is reasonable to say that scrub has a > kernel interface that goes through EDAC and not fwctl. > > For the "anonymous" "Features" that advertise an "Immediate > Configuration Change" effect those need CAP_SYS_RAWIO at a minimum, > possibly a kernel taint, and/or compile time option to block them. Maybe > that encourages more "Configuration Change after Reset" Set Feature > capabilities which carry less risk of confusing a running kernel. I think a solid consensus on the topics above would be really useful for gpu/accel too. We're still busy with more pressing community/ecosystem building needs, but gpu fw has become rather complex and it's not stopping. And there's random other devices attached too nowadays, so fwctl makes a ton of sense. But for me the more important stuff would be some clear guidelines like what should be in other more across-devices subsystems like edac (or other ras features), what should be in functional subsystems like netdev, rdma, gpu/accel, ... whatever else, and what should be exposed through some special purpose subsystems like hwmon. And then also what the access control guidelines should be around tainting and and premission checks. We've got plenty of experience in enforcing such a community contract with vendors, but the hard part is creating a clear and ideally concise documentation page I can just point vendors at as the ground truth. I'm also not super worried about the uapi breakage scenarios. We have plenty of experience in drm with sometimes horrendous hacks to keep existing userspace and real-world use-cases going, while still being able to move the subsystem forward and standardize more stuff as it starts to make sense. But the "here's the goal" documentation, maybe with the occasional update when a new subsystem like edac shows up, is imo the hard part and would be really, really useful. Cheers, Sima -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch