On Tue, Aug 06, 2024 at 09:14:20AM +0200, Daniel Vetter wrote: > On Thu, Aug 01, 2024 at 11:22:23AM -0300, Jason Gunthorpe wrote: > > On Tue, Jul 30, 2024 at 09:13:00AM +0200, Daniel Vetter wrote: > > > I think a solid consensus on the topics above would be really useful for > > > gpu/accel too. We're still busy with more pressing community/ecosystem > > > building needs, but gpu fw has become rather complex and it's not > > > stopping. And there's random other devices attached too nowadays, so fwctl > > > makes a ton of sense. > > > > Yeah, I'm pretty sure GPU is going to need fwctl too, the GPU's are > > going to have the same issues as NIC does. I see people are already > > struggling with topics like how to get debug traces out of the GPU FW. > > > > > But for me the more important stuff would be some clear guidelines like > > > what should be in other more across-devices subsystems like edac (or other > > > ras features), what should be in functional subsystems like netdev, rdma, > > > gpu/accel, ... whatever else, and what should be exposed through some > > > special purpose subsystems like hwmon. > > > > In my mind the most important part is that fwctl is not exclusive, the > > FW interface and things being manipulated must be sharable or blocked > > from fwctl. We should never get in a situation where a fwctl > > implementation becomes a reason we cannot have a functional subsystem > > interface. > > Hm still not clear to me how you want to achive that, but I guess best > I'll jump over to the fwctl thread and ask about those details > there. I'm looking at it from the perspective of mlx5 which has deep multi-user support in the FW. There is almost nothing in the interface that is "global" and would become a problem. Everything else can, and often already is, reasonably be shared. I think that would have to be the baseline for what you could expose here. Like with the memory scrubbing example. It would be fine if fwctl can read any related counters concurrently with the EDAC driver reading the same counters. But fwctl shouldn't clear counters or program a single global scrubber unit. This limitation has to be baked into the FW/driver on the fwctl side to undertsand and block these things. Jason