Re: [External] [LSF/MM/BPF TOPIC] CXL Fabric Manager (FM) architecture

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Feb 8, 2023, at 8:38 AM, Adam Manzanares <a.manzanares@xxxxxxxxxxx> wrote:
> 
> On Thu, Feb 02, 2023 at 09:54:02AM +0000, Jonathan Cameron wrote:
>> On Wed, 1 Feb 2023 12:04:56 -0800
>> "Viacheslav A.Dubeyko" <viacheslav.dubeyko@xxxxxxxxxxxxx> wrote:
>> 
>>>> 

<skipped>

>>> 
>>> Most probably, we will have multiple FM implementations in firmware.
>>> Yes, FM on host could be important for debug and to verify correctness
>>> firmware-based implementations. But FM daemon on host could be important
>>> to receive notifications and react somehow on these events. Also, journalling
>>> of events/messages/events could be important responsibility of FM daemon
>>> on host. 
>> 
>> I agree with an FM daemon somewhere (potentially running on the BMC type chip
>> that also has the lower level FM-API access).  I think it is somewhat
>> separate from the rest of this on basis it may well just be talking redfish
>> to the FM and there are lots of tools for that sort of handling already.
>> 
> 
> I would be interested in particpating in a BOF about this topic. I wonder what
> happens when we have multiple switches with multiple FMs each on a separate BMC.
> In this case, does it make more sense to have an owner of the global FM state 
> be a user space application. Is this the job of the orchestrator?
> 
> The BMC based FM seems to have scalability issues, but will we hit them in
> practice any time soon.

I had discussion recently and it looks like there are interesting points:
(1) If we have multiple CXL switches (especially with complex hierarchy), then it is
very compute-intensive activity. So, potentially, FM on firmware side could be not
capable to digest and executes all responsibilities without potential performance
degradation.
(2) However, if we have FM on host side, then there is security concerns because
FM sees everything and all details of multiple hosts and subsystems.
(3) Technically speaking, there is one potential capability that user-space FM daemon
can run as on host side as on CXL switch side. I mean here that if we implement
user-space FM daemon, then it could be used to execute FM functionality on CXL
switch side (maybe????). :)

<skipped>

>>>>>   - Manage surprise removal of devices  
>>>> 
>>>> Likewise, beyond reporting I wouldn't expect the FM daemon to have any idea
>>>> what to do in the way of managing this.  Scream loudly?
>>>> 
>>> 
>>> Maybe, it could require application(s) notification. Let’s imagine that application
>>> uses some resources from removed device. Maybe, FM can manage kernel-space
>>> metadata correction and helping to manage application requests to not existing
>>> entities.
>> 
>> Notifications for the host are likely to come via inband means - so type3 driver
>> handling rather than related to FM.  As far as the host is concerned this is the
>> same as case where there is no FM and someone ripped a device out.
>> 
>> There might indeed be meta data to manage, but doubt it will have anything to
>> do with kernel.
>> 
> 
> I've also had similar thoughts, I think the OS responds to notifications that
> are generated in-band after changes to the state of the FM are made through 
> OOB means.
> 
> I envision the host sends REDFISH requests to a switch BMC that has an FM
> implementation. Once the changes are implemented by the FM it would show up
> as changes to the PCIe hierarchy on a host, which is capable of responding to
> such changes.
> 

I think I am not completely follow your point. :) First of all, I assume that if host
sends REDFISH request, then it will be expected the confirmation of request execution.
It means for me that host needs to receive some packet that informs that request
executed successfully or failed. It means that some subsystem or application requested
this change and only after receiving the confirmation requested capabilities can be used.
And if FM is on CXL switch side, then how FM will show up the changes? It sounds for me
that some FM subsystem should be on the host side to receive confirmation/notification
and to execute the real changes in PCIe hierarchy. Am missing something here?

Thanks,
Slava.






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux