> On Feb 10, 2023, at 4:32 AM, Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote: > > On Thu, 9 Feb 2023 14:04:13 -0800 > "Viacheslav A.Dubeyko" <viacheslav.dubeyko@xxxxxxxxxxxxx> wrote: > >>> On Feb 9, 2023, at 3:05 AM, Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote: >>> >>> On Wed, 8 Feb 2023 10:03:57 -0800 >>> "Viacheslav A.Dubeyko" <viacheslav.dubeyko@xxxxxxxxxxxxx> wrote: >>> >>>>> On Feb 8, 2023, at 8:38 AM, Adam Manzanares <a.manzanares@xxxxxxxxxxx> wrote: >>>>> >>>>> On Thu, Feb 02, 2023 at 09:54:02AM +0000, Jonathan Cameron wrote: >>>>>> On Wed, 1 Feb 2023 12:04:56 -0800 >>>>>> "Viacheslav A.Dubeyko" <viacheslav.dubeyko@xxxxxxxxxxxxx> wrote: >>>>>> >>>>>>>> >>>> >>>> <skipped> >>>> >>>>>>> >>>>>>> Most probably, we will have multiple FM implementations in firmware. >>>>>>> Yes, FM on host could be important for debug and to verify correctness >>>>>>> firmware-based implementations. But FM daemon on host could be important >>>>>>> to receive notifications and react somehow on these events. Also, journalling >>>>>>> of events/messages/events could be important responsibility of FM daemon >>>>>>> on host. >>>>>> >>>>>> I agree with an FM daemon somewhere (potentially running on the BMC type chip >>>>>> that also has the lower level FM-API access). I think it is somewhat >>>>>> separate from the rest of this on basis it may well just be talking redfish >>>>>> to the FM and there are lots of tools for that sort of handling already. >>>>>> >>>>> >>>>> I would be interested in particpating in a BOF about this topic. I wonder what >>>>> happens when we have multiple switches with multiple FMs each on a separate BMC. >>>>> In this case, does it make more sense to have an owner of the global FM state >>>>> be a user space application. Is this the job of the orchestrator? >>> >>> This partly comes down to terminology. Ultimately there is an FM that is >>> responsible for the whole fabric (could be distributed software) and that >>> in turn will talk to a the various BMCs that then talk to the switches. >>> >>> Depending on the setup it may not be necessary for any entity to see the >>> whole fabric. >>> >>> Interesting point in general though. I think it boils down to getting >>> layering in any software correct and that is easier done from outset. >>> >>> I don't know whether the redfish stuff is flexible enough to cover this, but >>> if it is, I'd envision, the actual FM talking redfish to a bunch of sub-FMs >>> and in turn presenting redfish to the orchestrator. >>> >>> Any of these components might run on separate machines, or in firmware on >>> some device, or indeed all run on one server that is acting as the FM and >>> a node in the orchestrator layer. >>> >>>>> >>>>> The BMC based FM seems to have scalability issues, but will we hit them in >>>>> practice any time soon. >>> >>> Who knows ;) If anyone builds the large scale fabric stuff in CXL 3.0 then >>> we definitely will in the medium term. >>> >>>> >>>> I had discussion recently and it looks like there are interesting points: >>>> (1) If we have multiple CXL switches (especially with complex hierarchy), then it is >>>> very compute-intensive activity. So, potentially, FM on firmware side could be not >>>> capable to digest and executes all responsibilities without potential performance >>>> degradation. >>> >>> There is firmware and their is firmware ;) It's not uncommon for BMCs to be >>> significant devices in their own right and run Linux or other heavy weight OSes. >>> >>>> (2) However, if we have FM on host side, then there is security concerns because >>>> FM sees everything and all details of multiple hosts and subsystems. >>> >>> Agreed. Other than testing I wouldn't expect the FM to run on a 'host', but in >>> at lest some implementations it will be running on a capable Linux machine. >>> In large fabrics that may be very capable indeed (basically a server dedicated to >>> this role). >>> >>>> (3) Technically speaking, there is one potential capability that user-space FM daemon >>>> can run as on host side as on CXL switch side. I mean here that if we implement >>>> user-space FM daemon, then it could be used to execute FM functionality on CXL >>>> switch side (maybe????). :) >>> >>> Sure, anything could run anywhere. We should draw up some 'reference' architectures >>> though to guide discussion down the line. Mind you I think there are a lot of >>> steps along the way and starting point should be a simple PoC where all the FM >>> stuff is in linux userspace (other than comms). That's easy enough to do. >>> If I get a quiet week or so I'll hammer out what we need on emulation side to >>> start playing with this. >>> >>> Jonathan >>> >>> >>> >>>> >>>> <skipped> >>>> >>>>>>>>> - Manage surprise removal of devices >>>>>>>> >>>>>>>> Likewise, beyond reporting I wouldn't expect the FM daemon to have any idea >>>>>>>> what to do in the way of managing this. Scream loudly? >>>>>>>> >>>>>>> >>>>>>> Maybe, it could require application(s) notification. Let’s imagine that application >>>>>>> uses some resources from removed device. Maybe, FM can manage kernel-space >>>>>>> metadata correction and helping to manage application requests to not existing >>>>>>> entities. >>>>>> >>>>>> Notifications for the host are likely to come via inband means - so type3 driver >>>>>> handling rather than related to FM. As far as the host is concerned this is the >>>>>> same as case where there is no FM and someone ripped a device out. >>>>>> >>>>>> There might indeed be meta data to manage, but doubt it will have anything to >>>>>> do with kernel. >>>>>> >>>>> >>>>> I've also had similar thoughts, I think the OS responds to notifications that >>>>> are generated in-band after changes to the state of the FM are made through >>>>> OOB means. >>>>> >>>>> I envision the host sends REDFISH requests to a switch BMC that has an FM >>>>> implementation. Once the changes are implemented by the FM it would show up >>>>> as changes to the PCIe hierarchy on a host, which is capable of responding to >>>>> such changes. >>>>> >>>> >>>> I think I am not completely follow your point. :) First of all, I assume that if host >>>> sends REDFISH request, then it will be expected the confirmation of request execution. >>>> It means for me that host needs to receive some packet that informs that request >>>> executed successfully or failed. It means that some subsystem or application requested >>>> this change and only after receiving the confirmation requested capabilities can be used. >>>> And if FM is on CXL switch side, then how FM will show up the changes? It sounds for me >>>> that some FM subsystem should be on the host side to receive confirmation/notification >>>> and to execute the real changes in PCIe hierarchy. Am missing something here? >>> >>> Another terminology issue I think. FM from CXL side of things is an abstract thing >>> (potentially highly layered / distributed) that acts on instructions from an >>> orchestrator (also potentially highly distributed, one implementation is hosts >>> can be the orchestrator) and configures the fabric. >>> The downstream APIs to the switches and EPs are all in FM-API (CXL spec) >>> Upstream probably all Redfish. What happens in between is impdef (though >>> obviously mapping to Redfish or FM-API as applicable may make it more >>> reuseable and flexible). >>> >>> I think some diagrams of what is where will help. >>> I think we need (note I've always kept the controller hosts as normal hosts as well >>> as that includes the case where it never uses the Fabric - so BMC type cases as >>> a subset without needing to double the number of diagrams). >>> >>> 1) Diagram of single host with the FM as one 'thing' on that host - direct interfaces >>> to a single switch - interfaces options include switch CCI MB, mctp of PCI VDM, >>> mctp over say i2c. >>> >>> 2) Diagram of same as above, with a multiple head device all connected to one host. >>> >>> 3) Diagram of 1 (maybe with MHD below switches), but now with multiple hosts, >>> one of which is responsible for fabric management. FM in that manager host >>> and orchestrator) - agents on other hosts able to send requests for services to that host. >>> >>> 4) Diagram of 3, but now with multiple switches, each with separate controlling host. >>> Some other hosts that don't have any fabric control. >>> Distributed FM across the controlling hosts. >>> >>> 5) Diagram of 4 but with layered FM and separate Orchestrator. Hosts all talk to the >>> orchestrator, that then talks to the FM. >>> >>> 6) 4, but push some management entities down into switches (from architecture point of >>> view this is no different from layered case with a separate BMC per switch - there >>> is still either a distribute FM or a layered FM, which the orchestrator talks to.) >>> >>> Can mess with exactly distribution of who does what across the various layers. >>> >>> I can sketch this lot up (and that will probably make some gaps in these cases apparent) >>> but will take a little while, hence text descriptions in the meantime. >>> >>> I come back to my personal view though - which is don't worry too much at this early >>> stage, beyond making sure we have some layering in code so that we can distribute >>> it across a distributed or layered architecture later! >>> >> >> I had slightly more simplified image in my mind. :) We definitely need to have diagrams >> to clarify the vision. But which collaboration tool could we use to work publicly on diagrams? >> Any suggestion? > > Ascii art :) To have a broad discussion it needs to be mailing list and that > is effectively only option. > I tried to prepare some diagram based on ascii art. :) It looks pretty terrible in email: ---------------------------- ------------------ | --------- ------ | | | | | Agent | <---> | FM | | | | | --------- ------ |<------->| CXL switch | | Host | | | | | | | ---------------------------- ————————— I think we need to use some online resource, anyway. We are discussing with Adam what we can do here. You introduced Orchestrator entity. I realized that I am not completely follow the responsibility of this subsystem. Do you imply some central point of management of multiple FM instances? Something like a router that has knowledge base and can redirect the request to proper FM instance. Am I correct? It sounds to me that orchestrator needs to implement some sub-API of FM. Or, maybe, it needs to parse REDFISH packets, for example, and only redirects the packets. Thanks, Slava.