Re: [PATCH v17 0/5] FPGA Image Load (previously Security Manager)

Russ Weight <russell.h.weight@xxxxxxxxx> · Fri, 15 Oct 2021 10:34:23 -0700

On 10/14/21 7:51 PM, Xu Yilun wrote:
> On Thu, Oct 14, 2021 at 09:32:53AM -0700, Russ Weight wrote:
>>
>> On 10/13/21 6:49 PM, Xu Yilun wrote:
>>> On Wed, Oct 13, 2021 at 11:09:08AM -0700, Russ Weight wrote:
>>>> On 10/12/21 6:06 PM, Xu Yilun wrote:
>>>>> On Tue, Oct 12, 2021 at 10:20:15AM -0700, Russ Weight wrote:
>>>>>> On 10/12/21 12:47 AM, Xu Yilun wrote:
>>>>>>> On Mon, Oct 11, 2021 at 06:00:16PM -0700, Russ Weight wrote:
>>>>>>>> On 10/11/21 5:35 AM, Tom Rix wrote:
>>>>>>>>> On 10/10/21 6:41 PM, Xu Yilun wrote:
>>>>>>>>>> On Sat, Oct 09, 2021 at 05:11:20AM -0700, Tom Rix wrote:
>>>>>>>>>>> On 10/9/21 1:08 AM, Xu Yilun wrote:
>>>>>>>>>>>> On Wed, Sep 29, 2021 at 04:00:20PM -0700, Russ Weight wrote:
>>>>>>>>>>>>> The FPGA Image Load framework provides an API to upload image
>>>>>>>>>>>>> files to an FPGA device. Image files are self-describing. They could
>>>>>>>>>>>>> contain FPGA images, BMC images, Root Entry Hashes, or other device
>>>>>>>>>>>>> specific files. It is up to the lower-level device driver and the
>>>>>>>>>>>>> target device to authenticate and disposition the file data.
>>>>>>>>>>>> I've reconsider the FPGA persistent image update again, and think we
>>>>>>>>>>>> may include it in FPGA manager framework.
>>>>>>>>>>>>
>>>>>>>>>>>> Sorry I raised this topic again when it is already at patch v17, but now
>>>>>>>>>>>> I need to consider more seriously than before.
>>>>>>>>>>>>
>>>>>>>>>>>> We have consensus the FPGA persistent image update is just like a normal
>>>>>>>>>>>> firmware update which finally writes the nvmem like flash or eeprom,
>>>>>>>>>>>> while the current FPGA manager deals with the active FPGA region update
>>>>>>>>>>>> and re-activation. Could we just expand the FPGA manager and let it handle
>>>>>>>>>>>> the nvmem update as well? Many FPGA cards have nvmem and downloaders
>>>>>>>>>>>> supports updating both FPGA region and nvmem.
>>>>>>>> The fpga-image-load driver is actually just a data transfer. The class
>>>>>>> IMHO, The fpga-mgr dev is also a data transfer. The fpga-region dev is
>>>>>>> acting as the FPGA region admin responsible for gating, transfering and
>>>>>>> re-enumerating.
>>>>>>>
>>>>>>> So my opinion is to add a new data transfer type and keep a unified process.
>>>>>>>
>>>>>>>> driver has no knowledge about what the data is or where/if the data will
>>>>>>>> be stored.
>>>>>>> The fpga-image-load driver knows the data will be stored in nvmem,
>>>>>> FYI: This is not strictly correct. In a coming product there is a
>>>>>> case where the data will be stored in RAM. Richard Gong was also
>>>>>> looking to use this driver to validate an image without programming
>>>>>> or storing it. The fpga-image-load driver has no expectation that
>>>>>> the data will be stored in nvmem, or even that it will be stored
>>>>>> at all.
>>>>> OK, we can discuss that use case then. But fundamentally a driver should
>>>>> be clear what it is doing.
>>>> The lower-level driver is, of course, clear what it is doing. And the
>>>> FPGA Image Load Framework simply provides a consistent API and manages
>>>> a potentially long-running data transfer in the context of a kernel
>>>> worker thread.
>>>>
>>>> It sounds like you are saying that that is not "clear enough" in the
>>>> context of the FPGA Manager?
>>>>
>>>> The files that are used with Intel PAC devices are self-describing. The
>>>> user-space tools, the class driver and the lower-level driver just pass
>>>> the data through to the card BMC without any knowledge of the content,
>>>> purpose or final destination of the data.
>>>>
>>>> The card BMC will receive signed data, validate it, and process it as a
>>>> BMC image, an FPGA image, a Root Entry Hash, or a key cancellation. In
>>> I category all these actions as firmware update fully or partially on
>>> persistent storage. The FPGA Manager don't have to know the meaning of
>>> every byte on flash, but it should be aware the firmware is updated and
>>> the card may acts differently with a new firmware. This is the common
>>> working model for most of the FPGA cards so that we implement it in FPGA
>>> manager class. 
>>>
>>>> the n6000, it could also be part of a multi-step process for programming
>>>> SDM keys and the data may not be stored permanently.
>>> This is new to me, but seems to be different from firmware update, so lets
>>> think about it again.
>>>
>>>>> You may try to extend the FPGA framework to
>>>>> support nvmem storage, or image validation, but cannot say we feed the
>>>>> data to any engine undefined by the framework.
>>>> I'm not sure what you mean by "feed the data to any engine undefined by the
>>>> framework". I think the "engine" is the lower level driver/device that invokes
>>>> the fpga_mgr. The lower level driver, of course, is clear what it is doing.
>>>> The fpga_mgr cannot control what driver invokes it.
>>>>
>>>> Are saying that when invoking the fpga-mgr, that it _must_ also pass descriptive
>>>> data. Meaning that a self-describing file alone is not acceptable?
>>> The class driver should define a reasonable working model and APIs.
>>> Updating the FPGA backup storage is good to me. But receiving a mystery
>>> box and do whatever it requires is not.
>>>
>>> Self-describing file is OK, encryption is OK, but either the class
>>> driver itself, or with the help of the low level driver, should make
>>> sure it works within its scope.
>> In our secure update process, the card BMC firmware authenticates
>> the data using the root entry hashes and will either reject the
>> data or perform some function based on the contents. Neither the
>> user-space, the class driver, nor the lower level driver know
>> what the contents are. It _is_ a "mystery box" to them. How do we
>> verify scope in this model?
> I think we need to find out how. One case is, the HW is designed to have
> one single function, such as firmware update, then any image input
> through firmware update API is within expectation, and the driver
> should only serve the firmware update API. I think this is how the
> N3000 is working now. If the HW is for another function, register itself
> to serve another API, or another class driver.
>
> Another case is, the HW could do multiple types of tasks depending on
> the content of the image, such as firmware update, image verification,
> or assumably power off the card ... There should be some mechanism for
> the driver to only accept the right image according to what API is called.
> Or the user may input an image named update_the_card.img through
> firmware update API and finally get the card off. Having some headers
> readable by host for the operation type? Or, some HW interface for host
> to apply the operation type as well as the image, let the HW verify?
> Let's think about it.
I'm not sure if I am following your thinking here. The HW, of course,
verifies authentication of the image and acts according to the image
type. I don't think it is reasonable to require the class driver to
interpret the data to determine what it is. That implies that the
class driver would have to know the header format and possible values.
It also means that changes to the header format would require patches
to the class driver.

The FPGA card is trusted by virtue of the fact that the customer
purchased it and physically placed it in the machine. If the FPGA card
(or the lower level driver) validates the image, then why does the
Class driver need to be concerned about the file type? I think the
purpose of the class driver is primarily to provide a common API and
perform common functions so that they don't have to be replicated
among similar low-level drivers. It is up to the low-level driver
or the device itself to ensure that the data received is acceptable.

If the card receives data through the fpga-mgr upload facility that
isn't strictly a firmware update, and if the lower-level driver or
the card handles it and returns appropriate status - is that really
a problem?
>> As you have noted, most current cases result in a change to the
>> card, and I suspect that it will remain that way. But that can't be
>> guaranteed, and I'm not convinced that a host driver needs to be
>> concerned about it.
> A host driver should know what is done, in some abstraction level.
> I think updating the persistent storage is an acceptable abstraction
> in FPGA domain, so I'd like to extend it in FPGA manager. But doing
> anything according to the image is not.
By host driver, do you mean the class driver? Or the lower-level device
driver?

It seems to me that you are saying that self-describing images are not
acceptable? Or at least they are not acceptable payload for an FPGA
Manager firmware-update API?

The FPGA Image Load Framework was designed with the concept of
transferring data to a device without imposing a purpose on the data.
The expectation is that the lower-level driver or the device will
validate the data. Is there something fundamentally wrong with that
approach? And if not, why couldn't we incorporate a similar image_load
API into the FPGA Manager?

- Russ

>
> Thanks,
> Yilun
>
>> - Russ
>>
>>> Thanks,
>>> Yilun
>>>
>>>> Thanks,
>>>> - Russ
>>>>
>>>>> Thanks,
>>>>> Yilun
>>>>>
>>>>>>> while
>>>>>>> the fpga-mgr knows the data will be stored in FPGA cells. They may need
>>>>>>> to know the exact physical position to store, may not, depends on what the
>>>>>>> HW engines are.
>>>>>>>
>>>>>>>> This functionality could, of course, be merged into the fpga-mgr. I did
>>>>>>>> a proof of concept of this a while back and we discussed the pros and cons.
>>>>>>>> See this email for a recap:
>>>>>>>>
>>>>>>>> https://marc.info/?l=linux-fpga&m=161998085507374&w=2
>>>>>>>>
>>>>>>>> Things have changed some with the evolution of the driver. The IOCTL
>>>>>>>> approach probably fits better than the sysfs implementation. At the time
>>>>>>>> it seemed that a merge would add unnecessary complexity without adding value.
>>>>>>> I think at least developers don't have to go through 2 sets of software
>>>>>>> stacks which are of the same concept. And adding some new features like
>>>>>>> optionally threading or canceling data transfer are also good to FPGA
>>>>>>> region update. And the nvmem update could also be benifit from exsiting
>>>>>>> implementations like scatter-gather buffers, in-kernel firmware loading.
>>>>>>>
>>>>>>> I try to explain myself according to each of your concern from that mail
>>>>>>> thread:
>>>>>>>
>>>>>>> Purpose of the 2 updates
>>>>>>> ========================
>>>>>>>
>>>>>>>   As I said before, I think they are both data transfer devices. FPGA
>>>>>>> region update gets extra support from fpga-region & fpga-bridge, and
>>>>>>> FPGA nvmem update could be a standalone fpga-mgr.
>>>>>>>
>>>>>>> Extra APIs that are unique to nvmem update
>>>>>>> ==========================================
>>>>>>>
>>>>>>>   cdev APIs for nvmem update:
>>>>>>>     Yes we need to expand the functionality so we need them.
>>>>>>>
>>>>>>>   available_images, image_load APIs for loading nvmem content to FPGA
>>>>>>>   region:
>>>>>>>     These are features in later patchsets which are not submitted, but we
>>>>>>>     could talk about it in advance. I think this is actually a FPGA region
>>>>>>>     update from nvmem, it also requires gating, data loading (no SW transfer)
>>>>>>>     and re-enumeration, or a single command to image_load HW may result system
>>>>>>>     disordered. The FPGA framework now only supports update from in-kernel
>>>>>>>     user data, maybe we add support for update from nvmem later.
>>>>>>>
>>>>>>>   fpga-mgr state extend:
>>>>>>>     I think it could be extended, The current states are not perfect,
>>>>>>>     adding something like IDLE or READY is just fine.
>>>>>>>
>>>>>>>   fpga-mgr status extend:
>>>>>>>     Add general error definitions as needed. If there is some status
>>>>>>>     that is quite vendor specific, expose it in low-level driver.
>>>>>>>
>>>>>>>   remaining-size:
>>>>>>>     Nice to have for all.
>>>>>>>
>>>>>>> Threading the update
>>>>>>> ====================
>>>>>>>
>>>>>>>   Also a good option for FPGA region update, maybe we also have a slow FPGA
>>>>>>>   reprogrammer?
>>>>>>>
>>>>>>> Cancelling the update
>>>>>>> ====================
>>>>>>>
>>>>>>>   Also a good option for FPGA region update if HW engine supports.
>>>>>> These are all good points.
>>>>>>
>>>>>> Thanks,
>>>>>> - Russ
>>>>>>> Thanks,
>>>>>>> Yilun
>>>>>>>
>>>>>>>>>>>> According to the patchset, the basic workflow of the 2 update types are
>>>>>>>>>>>> quite similar, get the data, prepare for the HW, write and complete.
>>>>>>>>>>>> They are already implemented in FPGA manager. We've discussed some
>>>>>>>>>>>> differences like threading or canceling the update, which are
>>>>>>>>>>>> not provided by FPGA manager but they may also nice to have for FPGA
>>>>>>>>>>>> region update. An FPGA region update may also last for a long time??
>>>>>>>>>>>> So I think having 2 sets of similar frameworks in FPGA is unnecessary.
>>>>>>>>>>>>
>>>>>>>>>>>> My quick mind is that we add some flags in struct fpga_mgr & struct
>>>>>>>>>>>> fpga_image_info to indicate the HW capability (support FPGA region
>>>>>>>>>>>> update or nvmem update or both) of the download engine and the provided
>>>>>>>>>>>> image type. Then the low-level driver knows how to download if it
>>>>>>>>>>>> supports both image types.An char device could be added for each fpga manager dev, providing the
>>>>>>>>>>>> user APIs for nvmem update. We may not use the char dev for FPGA region
>>>>>>>>>>>> update cause it changes the system HW devices and needs device hotplug
>>>>>>>>>>>> in FPGA region. We'd better leave it to FPGA region class, this is
>>>>>>>>>>>> another topic.
>>>>>>>> I'll give this some more thought and see if I can come up with some RFC
>>>>>>>> patches.
>>>>>>>>
>>>>>>>> - Russ
>>>>>>>>>>>> More discussion is appreciated.
>>>>>>>>>>> I also think fpga_mgr could be extended.
>>>>>>>>>>>
>>>>>>>>>>> In this patchset,
>>>>>>>>>>>
>>>>>>>>>>> https://lore.kernel.org/linux-fpga/20210625195849.837976-1-trix@xxxxxxxxxx/
>>>>>>>>>>>
>>>>>>>>>>> A second, similar set of write ops was added to fpga_manger_ops,
>>>>>>>>>>>
>>>>>>>>>>> new bit/flag was added to fpga_image_info
>>>>>>>>>>>
>>>>>>>>>>> The intent was for dfl to add their specific ops to cover what is done in
>>>>>>>>>>> this patchset.
>>>>>>>>>> I think we don't have to add 2 ops for reconfig & reimage in framework,
>>>>>>>>>> the 2 processes are almost the same.
>>>>>>>>>>
>>>>>>>>>> Just add the _REIMAGE (or something else, NVMEM?) flag for
>>>>>>>>>> fpga_image_info, and low level drivers handle it as they do for other
>>>>>>>>>> flags.
>>>>>>>>>>
>>>>>>>>>> How do you think?
>>>>>>>>> A single set is fine.
>>>>>>>>>
>>>>>>>>> A difficult part of is the length of  time to do the write. The existing write should be improved to use a worker thread.
>>>>>>>>>
>>>>>>>>> Tom
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Yilun
>>>>>>>>>>
>>>>>>>>>>> Any other driver would do similar.
>>>>>>>>>>>
>>>>>>>>>>> Is this close to what you are thinking ?
>>>>>>>>>>>
>>>>>>>>>>> Tom
>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Yilun
>>>>>>>>>>>>