On Wed, Feb 08, 2023 at 10:03:57AM -0800, Viacheslav A.Dubeyko wrote: > > > > On Feb 8, 2023, at 8:38 AM, Adam Manzanares <a.manzanares@xxxxxxxxxxx> wrote: > > > > On Thu, Feb 02, 2023 at 09:54:02AM +0000, Jonathan Cameron wrote: > >> On Wed, 1 Feb 2023 12:04:56 -0800 > >> "Viacheslav A.Dubeyko" <viacheslav.dubeyko@xxxxxxxxxxxxx> wrote: > >> > >>>> > > <skipped> > > >>> > >>> Most probably, we will have multiple FM implementations in firmware. > >>> Yes, FM on host could be important for debug and to verify correctness > >>> firmware-based implementations. But FM daemon on host could be important > >>> to receive notifications and react somehow on these events. Also, journalling > >>> of events/messages/events could be important responsibility of FM daemon > >>> on host. > >> > >> I agree with an FM daemon somewhere (potentially running on the BMC type chip > >> that also has the lower level FM-API access). I think it is somewhat > >> separate from the rest of this on basis it may well just be talking redfish > >> to the FM and there are lots of tools for that sort of handling already. > >> > > > > I would be interested in particpating in a BOF about this topic. I wonder what > > happens when we have multiple switches with multiple FMs each on a separate BMC. > > In this case, does it make more sense to have an owner of the global FM state > > be a user space application. Is this the job of the orchestrator? > > > > The BMC based FM seems to have scalability issues, but will we hit them in > > practice any time soon. > > I had discussion recently and it looks like there are interesting points: > (1) If we have multiple CXL switches (especially with complex hierarchy), then it is > very compute-intensive activity. So, potentially, FM on firmware side could be not > capable to digest and executes all responsibilities without potential performance > degradation. > (2) However, if we have FM on host side, then there is security concerns because > FM sees everything and all details of multiple hosts and subsystems. > (3) Technically speaking, there is one potential capability that user-space FM daemon > can run as on host side as on CXL switch side. I mean here that if we implement > user-space FM daemon, then it could be used to execute FM functionality on CXL > switch side (maybe????). :) > > <skipped> > > >>>>> - Manage surprise removal of devices > >>>> > >>>> Likewise, beyond reporting I wouldn't expect the FM daemon to have any idea > >>>> what to do in the way of managing this. Scream loudly? > >>>> > >>> > >>> Maybe, it could require application(s) notification. Let’s imagine that application > >>> uses some resources from removed device. Maybe, FM can manage kernel-space > >>> metadata correction and helping to manage application requests to not existing > >>> entities. > >> > >> Notifications for the host are likely to come via inband means - so type3 driver > >> handling rather than related to FM. As far as the host is concerned this is the > >> same as case where there is no FM and someone ripped a device out. > >> > >> There might indeed be meta data to manage, but doubt it will have anything to > >> do with kernel. > >> > > > > I've also had similar thoughts, I think the OS responds to notifications that > > are generated in-band after changes to the state of the FM are made through > > OOB means. > > > > I envision the host sends REDFISH requests to a switch BMC that has an FM > > implementation. Once the changes are implemented by the FM it would show up > > as changes to the PCIe hierarchy on a host, which is capable of responding to > > such changes. > > > > I think I am not completely follow your point. :) First of all, I assume that if host > sends REDFISH request, then it will be expected the confirmation of request execution. > It means for me that host needs to receive some packet that informs that request > executed successfully or failed. It means that some subsystem or application requested > this change and only after receiving the confirmation requested capabilities can be used. > And if FM is on CXL switch side, then how FM will show up the changes? It sounds for me > that some FM subsystem should be on the host side to receive confirmation/notification > and to execute the real changes in PCIe hierarchy. Am missing something here? Hopefully I have a point ;). I do expect a host to receive a response for a given REDFISH request, but the request/response would be OOB. I would go back to the example of hot plugging in a PCIe based devices. For example if an nvme SSD is hot plugged, then the OS notified by HW that a new PCIe device has been added. Going back to changes made by the FM, if the changes impact the CXL hiearchy that is visible to a host, it is my expectation that the host OS will be informed of the changes requested of the FM when the host HW becomes aware of the changes (the in-band change). > > Thanks, > Slava. >