On Thu, Feb 02, 2023 at 09:54:02AM +0000, Jonathan Cameron wrote: > On Wed, 1 Feb 2023 12:04:56 -0800 > "Viacheslav A.Dubeyko" <viacheslav.dubeyko@xxxxxxxxxxxxx> wrote: > > > Hi Jonathan, > > > > > On Jan 31, 2023, at 9:41 AM, Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote: > > > > > > On Mon, 30 Jan 2023 11:11:23 -0800 > > > "Viacheslav A.Dubeyko" <viacheslav.dubeyko@xxxxxxxxxxxxx> wrote: > > > > > >> Hello, > > > > > > Hi Slava, > > > > > > I'll throw some opinions at this :) > > > > > >> > > >> I would like to suggest Fabric Manager (FM) architecture discussion. As far as I can see, > > >> FM architecture requires: (1) FM configuration tool, (2) FM daemon, (3) QEMU emulation > > >> of CXL hardware features. FM daemon receives requests from configuration tool and > > >> executes commands by means of interaction with kernel-space subsystem and CXL switch > > >> (that can be emulated by QEMU). So, the key questions for discussion: > > > > > > Worth describing operating modes to be supported: You kind of cover this later > > > but I think pulling it out make it clearer that we want one bit of software to > > > do several different things. > > > > > > 1) FM separate from hosts and talked to by higher level orchestration software > > > but using a Switch CCI or MHD mailbox (over PCI) > > > This one is fairly easy because any security / shooting self in foot problems > > > are an issue for higher level software. > > > 2) FM on host. Probably mostly going be relevant for debug but may use > > > the same mailbox as is being used by the existing CXL drivers (for Multi > > > Head Device it might be the end point mailbox, for Multi Logical Device > > > behind a switch it might be the switch mailbox). > > > 3) All out of band (MCTP or similar - want some shared code, but no > > > need for anything in kernel as far as I can tell). > > > > > > > Most probably, we will have multiple FM implementations in firmware. > > Yes, FM on host could be important for debug and to verify correctness > > firmware-based implementations. But FM daemon on host could be important > > to receive notifications and react somehow on these events. Also, journalling > > of events/messages/events could be important responsibility of FM daemon > > on host. > > I agree with an FM daemon somewhere (potentially running on the BMC type chip > that also has the lower level FM-API access). I think it is somewhat > separate from the rest of this on basis it may well just be talking redfish > to the FM and there are lots of tools for that sort of handling already. > I would be interested in particpating in a BOF about this topic. I wonder what happens when we have multiple switches with multiple FMs each on a separate BMC. In this case, does it make more sense to have an owner of the global FM state be a user space application. Is this the job of the orchestrator? The BMC based FM seems to have scalability issues, but will we hit them in practice any time soon. > > > > > > > >> (1) How to distribute functionality between user-space and kernel-space? > > > > > > Kernel for transport if mailbox based (switch or MHD). > > > Possibly help in kernel with the host to Multiheaded device FM LD tunneling > > > and host to switch to Multi Logical Device - Logical Device tunneling > > > but that could also be left to userspace. > > > > > > > People loves to move everything in user-space now. But I believe we could have > > as kernel-space as user-space solutions. I think we ned to check what way could be > > more efficient and elegant solution. > > Agreed - though I think we need to remember running this on the host that is > using the devices isn't likely to be a common actual usecase. So we should > design for that to 'work' but not to be the assumed method. Hence if any > sync type activity is needed it might be a case of don't do the wrong thing > rather than hard protections. > > > > > > If MCTP use the existing MCTP framework which is underlying transport independent. > > > I posted a PoC for how this might work a while ago (hack on top of MCTP-I2C > > > and some emulation) In the cover letter of the emulation PoC > > > > > > > Sounds interesting. Let me check it. But I believe it could not be not the first task > > in this implementation. :) > > Some level of MCTP support needs to be early enough that we don't get > any design decisions wrong. For MCTP I think the vast majority of handling > has to be in userspace. I don't want to end up with duplication because we did > some of that down in the kernel for the mailbox solution. > > > > > > > > > I think everything else belongs in userspace. I believe there are redfish APIs > > > etc that would then be used to query and drive the userspace program from an > > > orchestrator or similar level software. > > > > > > > I need to check the redfish API. It sounds reasonable to employ some existing > > framework. > > > > >> (2) Which functionality kernel-space needs to provide for implementation FM features? > > >> Which kernel-space functionality do we need to implement yet? > > > > > > Very little needed if we just expose the transport via PCI mailboxes. > > > There is a possible concern that FM-API commands are frequently > > > destructive and currently we don't let userspace poke destructive > > > commands. That may just need a specific opt in to say we know we > > > can shoot ourselves in the foot. > > > > > > > I think this is why we need kernel. It sounds for me that we have to have user-space > > and kernel-space collaboration here. > > I think it will be lightweight and looks like the existing CXL mailbox userspace > interface (some commands are the same). > > > > > >> (3) Do we need MCTP (Management Component Transport Protocol) or some other > > >> protocol can be used for interaction between configuration tool, FM daemon, and > > >> CXL switch? > > > > > > Yes MCTP is needed. > > > I don't think we want the actual management code to be different > > > depending on transport / protocol. However we might layer it so that there > > > is an interface program that sits between the management library / program and > > > the FM-API transport. > > > > > > Note I was struggling to find a suitable MCTP interface to emulate - so would > > > welcome suggestions on that. I hacked the above PoC using an aspeed i2c > > > controller that supported the right magic combination of features needed > > > for MCTP over I2C but it doesn't have ACPI support which rather limits > > > usage (and I doubt anyone will be keen on adding ACPI support just to > > > test CXL related code :) If anyone knows of a suitable MCTP host we > > > could use for this that would be great (MCTP over PCI VDM might be nice for > > > example) > > > > > > > Let us start some command/feature implementation and we will figure it out. > > But, I assume we need to start from something like CXL devices discovery at first. > > Sure - some of the kernel side of that was present in the switch-cci mailbox PoC > Obviously tooling was a test hack though ;) > > > > > >> (4) What architecture FM implementation requires? > > >> (5) Does it make sense to use Rust as implementation language? > > > > > > Take your pick ;) First person to write a lot of code gets to pick the language. > > > > > > > Yeah, I see the point. Rust can provide some benefits (memory safety model, for example). > > But it could introduce some issue with collaboration and makes implementation more > > slow. Everybody develops in C language. But switching on Rust could be not so easy > > target. > > > > <skipped> > > > > >> > > >> > > >> FM configuration tool requires such commands: > > > > > > A command line tool is fine, but like the 'real' FM configuration interface will be via > > > a protocol (e.g. redfish). > > > There is a WIP for CXL, though not sure on latest status on this (document on there is from > > > 2021) > > > > > > So ultimately I'd expect fm_cli to be a wrapper around libredfish / redfishtoo > > > that just makes it a bit easier to poke > > > with common commands. > > > > > > I'm far from an expert of redfish so may have this all wrong. > > > > > > > Sounds reasonable to me. Let me check how good it could be for this project. > > > > >> > > >> Discover - discover available agents > > >> Subcommands: > > >> - fm_cli discover fm - discover FM instances > > > > > > If we are allowing more than one FM then I'd expect all the > > > other commands to be directed at that by some sort of FM specific > > > ID. If only one, what does this command do that isn't better > > > done with fm get_info > > > > > > > Yes, we need to identify every object somehow. And it’s interesting point. > > From point of view, some human-friendly names could be good. > > But firmware-based FM implementation needs to follow the same rules. > > And it sounds for me that CXL specification should define how CXL FM or > > CXL device identify itself. Anyway, we need to ask CXL device and it should > > return to us some ID. Probably, it will be some GUID or likewise number. > > > > > > > >> - fm_cli discover cxl_devices - discover CXL devices > > >> - fm_cli discover logical_devices - discover logical devices > > > > > > Discover switches as well. > > > > > > > I assumed that CXL switch is a subclass of CXL devices. Do you mean that > > it is independent case? > > Maybe simpler broken out. What you do with a switch is often very different > form type 3 devices. > > > > > >> > > >> FM - manage Fabric Manager > > >> Subcommands: > > >> - fm_cli fm get_info - get FM status/info > > >> - fm_cli fm start - start FM instance > > >> - fm_cli fm restart - restart FM instance > > >> - fm_cli fm stop - stop FM instance > > >> - fm_cli fm get_config - get FM configuration > > >> - fm_cli fm set_config - set FM configuration > > > > > > I'd keep this slim for now. No idea what FM config we might want to > > > set so don't bother listing command yet. > > > > > > > Yeah, it’s not completely clear yet. But I assume we can consider such > > configuration options like: > > (1) register to receive event notifications > > (2) logging of events > > (3) errors handling > > > > >> - fm_cli fm get_events - get event records > > > Not sure what FM would have in the way of events (as opposed to > > > things it is talking to). > > > > > > > I think FM can log events. If we consider FM daemon on host, then it > > could issue messages to end user as reaction to some events. > > > > >> > > >> Switch - manage CXL switch > > >> Subcommands: > > >> - fm_cli switch get_info - get CXL switch info/status > > > > > > These all need an ID field of some type to identify which switch. > > > > > > > Yeah, it is exactly what we need for every command. We need to identify > > an object for a request. > > > > >> - fm_cli switch get_config - get switch configuraiton > > >> - fm_cli switch set_config - set switch configuration > > > > <skipped> > > > > >> > > >> DCD (Dynamic Capacity Device) - manage Dynamic Capacity Device > > >> Subcommands: > > >> - fm_cli dcd get_info - Get DCD Info (retrieves the number of supported hosts, > > >> total Dynamic Capacity of the device, and supported region configurations) > > >> - fm_cli dcd get_capacity_config - Get Host Dynamic Capacity Region Configuration > > >> (retrieves the Dynamic Capacity configuration for a specified host) > > >> - fm_cli dcd set_capacity_config - Set Dynamic Capacity Region Configuration > > >> (sets the configuration of a DC Region) > > >> - fm_cli dcd get_extent_list - Get DCD Extent Lists (retrieves the Dynamic Capacity > > >> Extent List for a specified host) > > >> - fm_cli dcd add_capacity - Initiate Dynamic Capacity Add (initiates the addition of > > >> Dynamic Capacity to the specified region on a host) > > > > > > That one is complex ;) Probably needs a whole man page to itself. > > > > > > > Currently, it’s only declaration of command set. Yeah, implementation will be complex. :) > > > > >> - fm_cli dcd release_capacity - Initiate Dynamic Capacity Release (initiates the release of > > >> Dynamic Capacity from a host) > > >> > > >> FM daemon receives requests from configuration tool and executes commands by means of > > >> interaction with kernel-space subsystems. The responsibility of FM daemon could be: > > >> - Execute configuration tool commands > > >> - Manage hot-add and hot-removal of devices > > > > > > In what sense? I'd expect it to notify some higher level entity > > > (orchestrator or similar) but not sure I see what management the > > > FM would do. > > > > > > > I assume that if FM manages some metadata, then hot-add or hot-removal could > > require some metadata corrections. Also, hot-add and hot-removal can generate some > > events that FM can receive and process somehow. For example, it is possible to log > > event messages into some journal. > > Ok. Potentially stuff there - though exactly which layer ends up managing this > stuff isn't obvious to me yet. > > > > > >> - Manage surprise removal of devices > > > > > > Likewise, beyond reporting I wouldn't expect the FM daemon to have any idea > > > what to do in the way of managing this. Scream loudly? > > > > > > > Maybe, it could require application(s) notification. Let’s imagine that application > > uses some resources from removed device. Maybe, FM can manage kernel-space > > metadata correction and helping to manage application requests to not existing > > entities. > > Notifications for the host are likely to come via inband means - so type3 driver > handling rather than related to FM. As far as the host is concerned this is the > same as case where there is no FM and someone ripped a device out. > > There might indeed be meta data to manage, but doubt it will have anything to > do with kernel. > I've also had similar thoughts, I think the OS responds to notifications that are generated in-band after changes to the state of the FM are made through OOB means. I envision the host sends REDFISH requests to a switch BMC that has an FM implementation. Once the changes are implemented by the FM it would show up as changes to the PCIe hierarchy on a host, which is capable of responding to such changes. > > > > >> - Receive and handle even notifications from the CXL switch > > >> - Logging events > > >> - Memory allocation and QoS Telemetry management > > >> - Error/Failure handling > > > > > > I'm not sure on separation of role between this component and > > > higher level policy / admin driven software. > > > > > > For memory allocation it might take a 'give host A this much > > > memory with this characteristic set' command and own the > > > allocations across all present devices, or it might just > > > act as an interface layer to higher level software that does > > > the fine detail of figuring out which device to allocate memory > > > from to satisfy such a request. > > > > > > Whilst I agree having a broad vision for an interface is good > > > there are a lot of subtle details in some of these commands > > > so I'd not spend too long refining the whole lot. Probably better > > > to look at them one at a time and then just have whoever ends > > > up maintaining / reviewing this thing responsible for making sure the > > > parameter format etc is consistent across commands. > > > > > > > Yes, I agree. Let’s do it step by step. I believe we need to start from > > implementation the application that process commands and do nothing > > at first. And first command that needs to be implemented is a discovery > > of CXL devices, switches, and FM instances because we need to identify > > CXL object somehow for any other command. > > Agreed discover of devices and capabilities is definitely where to start > + I think presenting that as a redfish model. > > Jonathan > > > > > Thanks, > > Slava. > > >