Re: [LSF/MM/BPF TOPIC] CXL Fabric Manager (FM) architecture

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 30 Jan 2023 11:11:23 -0800
"Viacheslav A.Dubeyko" <viacheslav.dubeyko@xxxxxxxxxxxxx> wrote:

> Hello,

Hi Slava,

I'll throw some opinions at this :)

> 
> I would like to suggest Fabric Manager (FM) architecture discussion. As far as I can see,
> FM architecture requires: (1) FM configuration tool, (2) FM daemon, (3) QEMU emulation
> of CXL hardware features. FM daemon receives requests from configuration tool and
> executes commands by means of interaction with kernel-space subsystem and CXL switch
> (that can be emulated by QEMU). So, the key questions for discussion:

Worth describing operating modes to be supported: You kind of cover this later
but I think pulling it out make it clearer that we want one bit of software to
do several different things.

1) FM separate from hosts and talked to by higher level orchestration software
   but using a Switch CCI or MHD mailbox (over PCI)
   This one is fairly easy because any security / shooting self in foot problems
   are an issue for higher level software. 
2) FM on host.  Probably mostly going be relevant for debug but may use
   the same mailbox as is being used by the existing CXL drivers (for Multi
   Head Device it might be the end point mailbox, for Multi Logical Device
   behind a switch it might be the switch mailbox).
3) All out of band (MCTP or similar - want some shared code, but no
   need for anything in kernel as far as I can tell).


> (1) How to distribute functionality between user-space and kernel-space?

Kernel for transport if mailbox based (switch or MHD).
Possibly help in kernel with the host to Multiheaded device FM LD tunneling
and host to switch to Multi Logical Device - Logical Device tunneling
but that could also be left to userspace.

If MCTP use the existing MCTP framework which is underlying transport independent.
I posted a PoC for how this might work a while ago (hack on top of MCTP-I2C
and some emulation) In the cover letter of the emulation PoC
https://lore.kernel.org/linux-cxl/20220520170128.4436-1-Jonathan.Cameron@xxxxxxxxxx/

I think everything else belongs in userspace. I believe there are redfish APIs
etc that would then be used to query and drive the userspace program from an
orchestrator or similar level software.

> (2) Which functionality kernel-space needs to provide for implementation FM features?
>       Which kernel-space functionality do we need to implement yet?

Very little needed if we just expose the transport via PCI mailboxes.
There is a possible concern that FM-API commands are frequently
destructive and currently we don't let userspace poke destructive
commands. That may just need a specific opt in to say we know we
can shoot ourselves in the foot.

> (3) Do we need MCTP (Management Component Transport Protocol) or some other
>       protocol can be used for interaction between configuration tool, FM daemon, and
>       CXL switch?

Yes MCTP is needed.
I don't think we want the actual management code to be different
depending on transport / protocol.  However we might layer it so that there
is an interface program that sits between the management library / program and
the FM-API transport.

Note I was struggling to find a suitable MCTP interface to emulate - so would
welcome suggestions on that.  I hacked the above PoC using an aspeed i2c
controller that supported the right magic combination of features needed
for MCTP over I2C but it doesn't have ACPI support which rather limits
usage (and I doubt anyone will be keen on adding ACPI support just to
test CXL related code :)  If anyone knows of a suitable MCTP host we
could use for this that would be great (MCTP over PCI VDM might be nice for
example)

> (4) What architecture FM implementation requires?
> (5) Does it make sense to use Rust as implementation language?

Take your pick ;) First person to write a lot of code gets to pick the language.

> 
> CXL Fabric Manager (FM) is the application logic responsible for system composition and
> allocation of resources. The FM can be embedded in the firmware of a device such as
> a CXL switch, reside on a host, or could be run on a Baseboard Management Controller (BMC).
> CXL Specification 3.0 defines Fabric Management as: "CXL devices can be configured statically
> or dynamically via a Fabric Manager (FM), an external logical process that queries and configures
> the system’s operational state using the FM commands defined in this specification. The FM is
> defined as the logical process that decides when reconfiguration is necessary and initiates
> the commands to perform configurations. It can take any form, including, but not limited to,
> software running on a host machine, embedded software running on a BMC, embedded firmware
> running on another CXL device or CXL switch, or a state machine running within the CXL device
> itself.”. CXL devices are configured by FM through the Fabric Manager Application Programming
> Interface (FM API) command sets through a CCI (Component Command Interface). A CCI is
> exposed through a device’s Mailbox registers or through an MCTP-capable (Management
> Component Transport Protocol) interface.
> 
> FM API Commands (defined by CXL Specification 3.0):
> (1) Physical switch (Identify Switch Device, Get Physical Port State, Physical Port Control,
>       Send PPB (PCI-to-PCI Bridge) CXL.io Configuration Request).
> (2) Virtual Switch (Get Virtual CXL Switch Info, Bind vPPB (Virtual PCI-to-PCI Bridge),
>       Unbind vPPB, Generate AER (Advanced Error Reporting Event).
> (3) MLD Port (Tunnel Management Command, Send LD (Logical Device) or
>      FMLD (Fabric Manager-owned Logical Device) CXL.io Configuration Request,
>      Send LD CXL.io Memory Request).
> (4) MLD Components (Get LD (Logical Device) Info, Get LD Allocations, Set LD Allocations,
>      Get QoS Control, Set QoS Control, Get QoS Status, Get QoS Allocated Bandwidth,
>      Set QoS Allocated Bandwidth, Get QoS Bandwidth Limit, Set QoS Bandwidth Limit).
> (5) Multi-Headed Devices (Get Multi-Headed Info).
> (6) DCD (Dynamic Capacity Device) Management (Get DCD Info, Get Host Dynamic
>      Capacity Region Configuration, Set Dynamic Capacity Region Configuration, Get DCD
>      Extent Lists, Initiate Dynamic Capacity Add, Initiate Dynamic Capacity Release).
> 
> After the initial configuration is complete and a CCI on the switch is operational, an FM can
> send Management Commands to the switch. An FM may perform the following dynamic
> management actions on a CXL switch: (1) Query switch information and configuration details,
> (2) Bind or Unbind ports, (3) Register to receive and handle event notifications from the switch
> (e.g., hot plug, surprise removal, and failures). A switch with MLD (Multi-Logical Device)
> requires an FM to perform the following management activities: (1) MLD discovery,
> (2) LD (Logical Device) binding/unbinding, (3) Management Command Tunneling. The FM can
> connect to an MLD (Multi-Logical Device) over a direct connection or by tunneling its
> management commands through the CCI of the CXL switch to which the device is connected.
> The FM can perform the following operations: (1) Memory allocation and QoS Telemetry
> management, (2) Security (e.g., LD erasure after unbinding), (3) Error handling.
> 
> FM configuration tool requires such commands:

A command line tool is fine, but like the 'real' FM configuration interface will be via
a protocol (e.g. redfish).
https://www.dmtf.org/standards/redfish
There is a WIP for CXL, though not sure on latest status on this (document on there is from
2021)

So ultimately I'd expect fm_cli to be a wrapper around libredfish / redfishtool
http://github.com/DMTF/RedFishTool etc that just makes it a bit easier to poke
with common commands.

I'm far from an expert of redfish so may have this all wrong.

> 
> Discover - discover available agents
> Subcommands:
>     - fm_cli discover fm - discover FM instances

If we are allowing more than one FM then I'd expect all the
other commands to be directed at that by some sort of FM specific
ID. If only one, what does this command do that isn't better
done with fm get_info


>     - fm_cli discover cxl_devices - discover CXL devices
>     - fm_cli discover logical_devices - discover logical devices

Discover switches as well.

> 
> FM - manage Fabric Manager
> Subcommands:
>     - fm_cli fm get_info - get FM status/info
>     - fm_cli fm start - start FM instance
>     - fm_cli fm restart - restart FM instance
>     - fm_cli fm stop - stop FM instance
>     - fm_cli fm get_config - get FM configuration
>     - fm_cli fm set_config - set FM configuration

I'd keep this slim for now.  No idea what FM config we might want to
set so don't bother listing command yet.

>     - fm_cli fm get_events - get event records
Not sure what FM would have in the way of events (as opposed to
things it is talking to).

> 
> Switch - manage CXL switch
> Subcommands:
>     - fm_cli switch get_info - get CXL switch info/status

These all need an ID field of some type to identify which switch.

>     - fm_cli switch get_config - get switch configuraiton
>     - fm_cli switch set_config - set switch configuration
> 
> Logical Device - manage logical devices
> Subcommands:
>     - fm_cli multi_headed_device info - retrieves the number of heads, number of
>            supported LDs, and Head-to- LD mapping of a Multi-Headed device
>     - fm_cli logical_device bind - bind logical device
>     - fm_cli logical_device unbind - unbind logical device
>     - fm_cli logical_device connect - connect Multi Logical Device to CXL switch
>     - fm_cli logical_device disconnect - disconnect Multi Logical Device from CXL switch
>     - fm_cli logical_device get_allocation - Get LD Allocations (retrieves the memory
>            allocations of the MLD)
>     - fm_cli logical_device set_allocation - Set LD Allocations (sets the memory allocation
>            for each LD)
>     - fm_cli logical_device get_qos_control - Get QoS Control (retrieves the MLD’s QoS
>            control parameters)
>     - fm_cli logical_device set_qos_control - Set QoS Control (sets the MLD’s QoS control
>            parameters)
>     - fm_cli logical_device get_qos_status - Get QoS Status (retrieves the MLD’s QoS Status)
>     - fm_cli logical_device get_qos_allocated_bandwidth - Get QoS Allocated Bandwidth
>           (retrieves the MLD’s QoS allocated bandwidth on a per-LD basis)
>     - fm_cli logical_device set_qos_allocated_bandwidth - Set QoS Allocated Bandwidth
>           (sets the MLD’s QoS allocated bandwidth on a per-LD basis)
>     - fm_cli logical_device get_qos_bandwidth_limit - Get QoS Bandwidth Limit (retrieves the
>           MLD’s QoS bandwidth limit on a per-LD basis)
>     - fm_cli logical_device set_qos_bandwidth_limit - Set QoS Bandwidth Limit (sets the
>           MLD’s QoS bandwidth limit on a per-LD basis)
>     - fm_cli logical_device erase - secure erase after unbinding
> 
> PCI-to-PCI Bridge - manage PPB (PCI-to-PCI Bridge)
> Subcommands:
>     - fm_cli ppb config - Send PPB (PCI-to-PCI Bridge) CXL.io Configuration Request

That one may want a more convenient interface as likely a lot of commands would be sent
if aim is to configure a device before binding.  Also CXL.io Memory requests want to be
here I think.

>     - fm_cli ppb bind - Bind vPPB (Virtual PCI-to-PCI Bridge inside a CXL switch that is
>            host-owned)
>     - fm_cli ppb unbind - Unbind vPPB (unbinds the physical port or LD from the virtual
>            hierarchy PPB)
> 
> Physical Port - manage physical ports
> Subcommands:
>     - fm_cli physical_port get_info - get state of physical port
>     - fm_cli physical_port control - control unbound ports and MLD ports, including issuing
>            resets and controlling sidebands
>     - fm_cli physical_port bind - bind physical port to vPPB (Virtual PCI-to-PCI Bridge)
>     - fm_cli physical_port unbind - unbind physical port from vPPB (Virtual PCI-to-PCI Bridge)
> 
> MLD (Multi-Logical Device) Port - manage Multi-Logical Device ports
> Subcommands:
>     - fm_cli mld_port tunnel - Tunnel Management Command (tunnels the provided command
>            to LD FFFFh of the MLD on the specified port)

Make if clear how nesting of commands in a tunnel would be specified.

>     - fm_cli mld_port send_config - Send LD (Logical Device) or FMLD (Fabric
>            Manager-owned Logical Device) CXL.io Configuration Request
>     - fm_cli mld_port send_memory_request - Send LD CXL.io Memory Request
> 
> DCD (Dynamic Capacity Device) - manage Dynamic Capacity Device
> Subcommands:
>     - fm_cli dcd get_info - Get DCD Info (retrieves the number of supported hosts,
>          total Dynamic Capacity of the device, and supported region configurations)
>     - fm_cli dcd get_capacity_config - Get Host Dynamic Capacity Region Configuration
>          (retrieves the Dynamic Capacity configuration for a specified host)
>     - fm_cli dcd set_capacity_config - Set Dynamic Capacity Region Configuration
>          (sets the configuration of a DC Region)
>     - fm_cli dcd get_extent_list - Get DCD Extent Lists (retrieves the Dynamic Capacity
>          Extent List for a specified host)
>     - fm_cli dcd add_capacity - Initiate Dynamic Capacity Add (initiates the addition of
>          Dynamic Capacity to the specified region on a host)

That one is complex ;) Probably needs a whole man page to itself.

>     - fm_cli dcd release_capacity - Initiate Dynamic Capacity Release (initiates the release of
>          Dynamic Capacity from a host)
> 
> FM daemon receives requests from configuration tool and executes commands by means of
> interaction with kernel-space subsystems. The responsibility of FM daemon could be:
>     - Execute configuration tool commands
>     - Manage hot-add and hot-removal of devices

In what sense?  I'd expect it to notify some higher level entity
(orchestrator or similar) but not sure I see what management the
FM would do.  

>     - Manage surprise removal of devices

Likewise, beyond reporting I wouldn't expect the FM daemon to have any idea
what to do in the way of managing this.  Scream loudly?

>     - Receive and handle even notifications from the CXL switch
>     - Logging events
>     - Memory allocation and QoS Telemetry management
>     - Error/Failure handling

I'm not sure on separation of role between this component and
higher level policy / admin driven software.

For memory allocation it might take a 'give host A this much
memory with this characteristic set' command and own the
allocations across all present devices, or it might just
act as an interface layer to higher level software that does
the fine detail of figuring out which device to allocate memory
from to satisfy such a request.

Whilst I agree having a broad vision for an interface is good
there are a lot of subtle details in some of these commands
so I'd not spend too long refining the whole lot. Probably better
to look at them one at a time and then just have whoever ends
up maintaining / reviewing this thing responsible for making sure the
parameter format etc is consistent across commands.

Fun fun fun

Jonathan

> 
> Thanks,
> Slava.
> 






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux