Re: [External] [LSF/MM/BPF TOPIC] CXL Fabric Manager (FM) architecture

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 1 Feb 2023 12:04:56 -0800
"Viacheslav A.Dubeyko" <viacheslav.dubeyko@xxxxxxxxxxxxx> wrote:

> Hi Jonathan,
> 
> > On Jan 31, 2023, at 9:41 AM, Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote:
> > 
> > On Mon, 30 Jan 2023 11:11:23 -0800
> > "Viacheslav A.Dubeyko" <viacheslav.dubeyko@xxxxxxxxxxxxx> wrote:
> >   
> >> Hello,  
> > 
> > Hi Slava,
> > 
> > I'll throw some opinions at this :)
> >   
> >> 
> >> I would like to suggest Fabric Manager (FM) architecture discussion. As far as I can see,
> >> FM architecture requires: (1) FM configuration tool, (2) FM daemon, (3) QEMU emulation
> >> of CXL hardware features. FM daemon receives requests from configuration tool and
> >> executes commands by means of interaction with kernel-space subsystem and CXL switch
> >> (that can be emulated by QEMU). So, the key questions for discussion:  
> > 
> > Worth describing operating modes to be supported: You kind of cover this later
> > but I think pulling it out make it clearer that we want one bit of software to
> > do several different things.
> > 
> > 1) FM separate from hosts and talked to by higher level orchestration software
> >   but using a Switch CCI or MHD mailbox (over PCI)
> >   This one is fairly easy because any security / shooting self in foot problems
> >   are an issue for higher level software. 
> > 2) FM on host.  Probably mostly going be relevant for debug but may use
> >   the same mailbox as is being used by the existing CXL drivers (for Multi
> >   Head Device it might be the end point mailbox, for Multi Logical Device
> >   behind a switch it might be the switch mailbox).
> > 3) All out of band (MCTP or similar - want some shared code, but no
> >   need for anything in kernel as far as I can tell).
> >   
> 
> Most probably, we will have multiple FM implementations in firmware.
> Yes, FM on host could be important for debug and to verify correctness
> firmware-based implementations. But FM daemon on host could be important
> to receive notifications and react somehow on these events. Also, journalling
> of events/messages/events could be important responsibility of FM daemon
> on host. 

I agree with an FM daemon somewhere (potentially running on the BMC type chip
that also has the lower level FM-API access).  I think it is somewhat
separate from the rest of this on basis it may well just be talking redfish
to the FM and there are lots of tools for that sort of handling already.

> 
> >   
> >> (1) How to distribute functionality between user-space and kernel-space?  
> > 
> > Kernel for transport if mailbox based (switch or MHD).
> > Possibly help in kernel with the host to Multiheaded device FM LD tunneling
> > and host to switch to Multi Logical Device - Logical Device tunneling
> > but that could also be left to userspace.
> >   
> 
> People loves to move everything in user-space now. But I believe we could have
> as kernel-space as user-space solutions. I think we ned to check what way could be
> more efficient and elegant solution.

Agreed - though I think we need to remember running this on the host that is
using the devices isn't likely to be a common actual usecase.  So we should
design for that to 'work' but not to be the assumed method. Hence if any
sync type activity is needed it might be a case of don't do the wrong thing
rather than hard protections.

> 
> > If MCTP use the existing MCTP framework which is underlying transport independent.
> > I posted a PoC for how this might work a while ago (hack on top of MCTP-I2C
> > and some emulation) In the cover letter of the emulation PoC
> >   
> 
> Sounds interesting. Let me check it. But I believe it could not be not the first task
> in this implementation. :)

Some level of MCTP support needs to be early enough that we don't get
any design decisions wrong.  For MCTP I think the vast majority of handling
has to be in userspace. I don't want to end up with duplication because we did
some of that down in the kernel for the mailbox solution.

> 
> > 
> > I think everything else belongs in userspace. I believe there are redfish APIs
> > etc that would then be used to query and drive the userspace program from an
> > orchestrator or similar level software.
> >   
> 
> I need to check the redfish API. It sounds reasonable to employ some existing
> framework.
> 
> >> (2) Which functionality kernel-space needs to provide for implementation FM features?
> >>      Which kernel-space functionality do we need to implement yet?  
> > 
> > Very little needed if we just expose the transport via PCI mailboxes.
> > There is a possible concern that FM-API commands are frequently
> > destructive and currently we don't let userspace poke destructive
> > commands. That may just need a specific opt in to say we know we
> > can shoot ourselves in the foot.
> >   
> 
> I think this is why we need kernel. It sounds for me that we have to have user-space
> and kernel-space collaboration here.

I think it will be lightweight and looks like the existing CXL mailbox userspace
interface (some commands are the same).

> 
> >> (3) Do we need MCTP (Management Component Transport Protocol) or some other
> >>      protocol can be used for interaction between configuration tool, FM daemon, and
> >>      CXL switch?  
> > 
> > Yes MCTP is needed.
> > I don't think we want the actual management code to be different
> > depending on transport / protocol.  However we might layer it so that there
> > is an interface program that sits between the management library / program and
> > the FM-API transport.
> > 
> > Note I was struggling to find a suitable MCTP interface to emulate - so would
> > welcome suggestions on that.  I hacked the above PoC using an aspeed i2c
> > controller that supported the right magic combination of features needed
> > for MCTP over I2C but it doesn't have ACPI support which rather limits
> > usage (and I doubt anyone will be keen on adding ACPI support just to
> > test CXL related code :)  If anyone knows of a suitable MCTP host we
> > could use for this that would be great (MCTP over PCI VDM might be nice for
> > example)
> >   
> 
> Let us start some command/feature implementation and we will figure it out.
> But, I assume we need to start from something like CXL devices discovery at first.

Sure - some of the kernel side of that was present in the switch-cci mailbox PoC
Obviously tooling was a test hack though ;)

> 
> >> (4) What architecture FM implementation requires?
> >> (5) Does it make sense to use Rust as implementation language?  
> > 
> > Take your pick ;) First person to write a lot of code gets to pick the language.
> >   
> 
> Yeah, I see the point. Rust can provide some benefits (memory safety model, for example).
> But it could introduce some issue with collaboration and makes implementation more
> slow. Everybody develops in C language. But switching on Rust could be not so easy
> target.
> 
> <skipped>
> 
> >> 
> >> 
> >> FM configuration tool requires such commands:  
> > 
> > A command line tool is fine, but like the 'real' FM configuration interface will be via
> > a protocol (e.g. redfish).
> > There is a WIP for CXL, though not sure on latest status on this (document on there is from
> > 2021)
> > 
> > So ultimately I'd expect fm_cli to be a wrapper around libredfish / redfishtoo
> >  that just makes it a bit easier to poke
> > with common commands.
> > 
> > I'm far from an expert of redfish so may have this all wrong.
> >   
> 
> Sounds reasonable to me. Let me check how good it could be for this project.
> 
> >> 
> >> Discover - discover available agents
> >> Subcommands:
> >>    - fm_cli discover fm - discover FM instances  
> > 
> > If we are allowing more than one FM then I'd expect all the
> > other commands to be directed at that by some sort of FM specific
> > ID. If only one, what does this command do that isn't better
> > done with fm get_info
> >   
> 
> Yes, we need to identify every object somehow. And it’s interesting point.
> From point of view, some human-friendly names could be good.
> But firmware-based FM implementation needs to follow the same rules.
> And it sounds for me that CXL specification should define how CXL FM or
> CXL device identify itself. Anyway, we need to ask CXL device and it should
> return to us some ID. Probably, it will be some GUID or likewise number.
> 
> >   
> >>    - fm_cli discover cxl_devices - discover CXL devices
> >>    - fm_cli discover logical_devices - discover logical devices  
> > 
> > Discover switches as well.
> >   
> 
> I assumed that CXL switch is a subclass of CXL devices. Do you mean that
> it is independent case?

Maybe simpler broken out. What you do with a switch is often very different
form type 3 devices.

> 
> >> 
> >> FM - manage Fabric Manager
> >> Subcommands:
> >>    - fm_cli fm get_info - get FM status/info
> >>    - fm_cli fm start - start FM instance
> >>    - fm_cli fm restart - restart FM instance
> >>    - fm_cli fm stop - stop FM instance
> >>    - fm_cli fm get_config - get FM configuration
> >>    - fm_cli fm set_config - set FM configuration  
> > 
> > I'd keep this slim for now.  No idea what FM config we might want to
> > set so don't bother listing command yet.
> >   
> 
> Yeah, it’s not completely clear yet. But I assume we can consider such
> configuration options like:
> (1) register to receive event notifications
> (2) logging of events
> (3) errors handling
> 
> >>    - fm_cli fm get_events - get event records  
> > Not sure what FM would have in the way of events (as opposed to
> > things it is talking to).
> >   
> 
> I think FM can log events. If we consider FM daemon on host, then it
> could issue messages to end user as reaction to some events.
> 
> >> 
> >> Switch - manage CXL switch
> >> Subcommands:
> >>    - fm_cli switch get_info - get CXL switch info/status  
> > 
> > These all need an ID field of some type to identify which switch.
> >   
> 
> Yeah, it is exactly what we need for every command. We need to identify
> an object for a request.
> 
> >>    - fm_cli switch get_config - get switch configuraiton
> >>    - fm_cli switch set_config - set switch configuration  
> 
> <skipped>
> 
> >> 
> >> DCD (Dynamic Capacity Device) - manage Dynamic Capacity Device
> >> Subcommands:
> >>    - fm_cli dcd get_info - Get DCD Info (retrieves the number of supported hosts,
> >>         total Dynamic Capacity of the device, and supported region configurations)
> >>    - fm_cli dcd get_capacity_config - Get Host Dynamic Capacity Region Configuration
> >>         (retrieves the Dynamic Capacity configuration for a specified host)
> >>    - fm_cli dcd set_capacity_config - Set Dynamic Capacity Region Configuration
> >>         (sets the configuration of a DC Region)
> >>    - fm_cli dcd get_extent_list - Get DCD Extent Lists (retrieves the Dynamic Capacity
> >>         Extent List for a specified host)
> >>    - fm_cli dcd add_capacity - Initiate Dynamic Capacity Add (initiates the addition of
> >>         Dynamic Capacity to the specified region on a host)  
> > 
> > That one is complex ;) Probably needs a whole man page to itself.
> >   
> 
> Currently, it’s only declaration of command set. Yeah, implementation will be complex. :)
> 
> >>    - fm_cli dcd release_capacity - Initiate Dynamic Capacity Release (initiates the release of
> >>         Dynamic Capacity from a host)
> >> 
> >> FM daemon receives requests from configuration tool and executes commands by means of
> >> interaction with kernel-space subsystems. The responsibility of FM daemon could be:
> >>    - Execute configuration tool commands
> >>    - Manage hot-add and hot-removal of devices  
> > 
> > In what sense?  I'd expect it to notify some higher level entity
> > (orchestrator or similar) but not sure I see what management the
> > FM would do.  
> >   
> 
> I assume that if FM manages some metadata, then hot-add or hot-removal could
> require some metadata corrections. Also, hot-add and hot-removal can generate some
> events that FM can receive and process somehow. For example, it is possible to log
> event messages into some journal.

Ok. Potentially stuff there - though exactly which layer ends up managing this
stuff isn't obvious to me yet.

> 
> >>    - Manage surprise removal of devices  
> > 
> > Likewise, beyond reporting I wouldn't expect the FM daemon to have any idea
> > what to do in the way of managing this.  Scream loudly?
> >   
> 
> Maybe, it could require application(s) notification. Let’s imagine that application
> uses some resources from removed device. Maybe, FM can manage kernel-space
> metadata correction and helping to manage application requests to not existing
> entities.

Notifications for the host are likely to come via inband means - so type3 driver
handling rather than related to FM.  As far as the host is concerned this is the
same as case where there is no FM and someone ripped a device out.

There might indeed be meta data to manage, but doubt it will have anything to
do with kernel.

> 
> >>    - Receive and handle even notifications from the CXL switch
> >>    - Logging events
> >>    - Memory allocation and QoS Telemetry management
> >>    - Error/Failure handling  
> > 
> > I'm not sure on separation of role between this component and
> > higher level policy / admin driven software.
> > 
> > For memory allocation it might take a 'give host A this much
> > memory with this characteristic set' command and own the
> > allocations across all present devices, or it might just
> > act as an interface layer to higher level software that does
> > the fine detail of figuring out which device to allocate memory
> > from to satisfy such a request.
> > 
> > Whilst I agree having a broad vision for an interface is good
> > there are a lot of subtle details in some of these commands
> > so I'd not spend too long refining the whole lot. Probably better
> > to look at them one at a time and then just have whoever ends
> > up maintaining / reviewing this thing responsible for making sure the
> > parameter format etc is consistent across commands.
> >   
> 
> Yes, I agree. Let’s do it step by step. I believe we need to start from
> implementation the application that process commands and do nothing
> at first. And first command that needs to be implemented is a discovery
> of CXL devices, switches, and FM instances because we need to identify
> CXL object somehow for any other command.

Agreed discover of devices and capabilities is definitely where to start
+ I think presenting that as a redfish model.

Jonathan

> 
> Thanks,
> Slava.
> 






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux