On Wed, 10 Aug 2022, Xu Yilun wrote:
Error handling is unpredictable and platform specific for two reasons. First, the overall implementation of PCI bus can have differences.On 2022-08-08 at 16:39:23 -0700, matthew.gerlach@xxxxxxxxxxxxxxx wrote:On Mon, 8 Aug 2022, Xu Yilun wrote:On 2022-08-08 at 01:33:16 -0400, Tianfei Zhang wrote:From: Russ Weight <russell.h.weight@xxxxxxxxx> Add the available_images and image_load sysfs files. The available_images file returns a space separated list of key words that may be written into the image_load file. These keywords describe an FPGA, BMC, or firmware image in FLASH or EEPROM storage that may be loaded. The image_load sysfs file may be written with a key word to trigger a reload of an FPGA, BMC, or firmware image from FLASH or EEPROM. Signed-off-by: Russ Weight <russell.h.weight@xxxxxxxxx> Signed-off-by: Tianfei Zhang <tianfei.zhang@xxxxxxxxx> --- v3: uses regmap_update_bits() API instead of m10bmc_sys_update_bits(). v2: adds the steps for how to use the "image_load" sysfs file. --- .../sysfs-driver-intel-m10-bmc-sec-update | 34 ++++++ drivers/fpga/intel-m10-bmc-sec-update.c | 105 ++++++++++++++++++ 2 files changed, 139 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-sec-update b/Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-sec-update index 0a41afe0ab4c..3d8f04ca6f1b 100644 --- a/Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-sec-update +++ b/Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-sec-update @@ -59,3 +59,37 @@ Contact: Russ Weight <russell.h.weight@xxxxxxxxx> Description: Read only. Returns number of times the secure update staging area has been flashed. Format: "%u". + +What: /sys/bus/platform/drivers/intel-m10bmc-sec-update/.../control/available_images +Date: July 2022 +KernelVersion: 5.20 +Contact: Russ Weight <russell.h.weight@xxxxxxxxx> +Description: Read-only. This file returns a space separated list of + key words that may be written into the image_load file + described below. These keywords decribe an FPGA, BMC, + or firmware image in FLASH or EEPROM storage that may + be loaded. + +What: /sys/bus/platform/drivers/intel-m10bmc-sec-update/.../control/image_load +Date: July 2022 +KernelVersion: 5.20 +Contact: Russ Weight <russell.h.weight@xxxxxxxxx> +Description: Write-only. A key word may be written to this file to + trigger a reload of an FPGA, BMC, or firmware image from + FLASH or EEPROM. Refer to the available_images file for a + list of supported key words for the underlying device. + Writing an unsupported string to this file will result in + EINVAL being returned. + It should remove all of resources related to the old FPGA/BMC + image before trigger the image reload otherwise the host system + may crash. We recommended that follow the below steps or directlyI suggest we solve this concern first before other detailed refinements. My opinion is, don't make the sysfs interface dependent of other user interfaces, like the following:+ use the OPAE RSU script to perform the reload for FPGA/BMC image. + Here is the steps to trigger the reload for FPGA/BMC image: + 1. disable the AER of the FPGA card. + 2. unbind the PFs/VFs which have bound with VFIO/UIO driver. + 3. trigger image reload via "image_load" sysfs file. + 4. remove all of PCI devices of the FPGA card via + "/sys/bus/pci/devices/xxxx/remove" sysfs file. + 5. wait 10 seconds. + 6. re-scan the PCI bus via "/sys/bus/pci/rescan" sysfs file. + 7. enable the AER of the FPGA card.It is not a good idea the writing of the sysfs node crashes the system, if we don't follow the whole steps. Thanks, YilunHi Yilun, The use case being supported with this trigger is the ability to reconfigure a FPGA or other programmable componenet on a board without the requiring the HW platform be able to power cycle a PCI slot or power cycling the whole system. Unfortunately, when a FPGA connected to a PCI bus is reconfigured, it can cause a PCI error. The actual pci error, if any, and any mitigation steps to handle the error is platform specific and dependent on the FPGA image itself. Therefore predicting and implementing all necessary errorWhy the error handling is unpredictable?
For example, Advanced Error Reporting and/or Downstream Port Containment are not implemented on every platform. Secondly, the actual error caused the triggering can be specific to a board and/or a FPGA design on the board.
mitigation in the kernel as part of the trigger would be an impossible task.Or could we just gate the pcie link? Just like we should disable fpga bridges before reprogramming any fpga region.
The ability to safely gate a pcie link is also a platform specific feature. While platforms exists that provide the ability for safe PCIe hot swap, many do not. Keeping as little of the overall safe sequence of FPGA reconfiguration in the kernel allows for the greatest flexibility to implement the platform specific part of the safe sequence in user space.
>
Actually I did find something for link disabing which may be useful. https://patchwork.kernel.org/project/linux-pci/patch/20190529104942.74991-1-mika.westerberg@xxxxxxxxxxxxxxx/ Thanks, YilunMatthewdiff --git a/drivers/fpga/intel-m10-bmc-sec-update.c b/drivers/fpga/intel-m10-bmc-sec-update.c index 72c677c910de..3a082911cf67 100644 --- a/drivers/fpga/intel-m10-bmc-sec-update.c +++ b/drivers/fpga/intel-m10-bmc-sec-update.c @@ -14,6 +14,8 @@ #include <linux/platform_device.h> #include <linux/slab.h> +struct image_load; + struct m10bmc_sec { struct device *dev; struct intel_m10bmc *m10bmc; @@ -21,6 +23,12 @@ struct m10bmc_sec { char *fw_name; u32 fw_name_id; bool cancel_request; + struct image_load *image_load; /* terminated with { } member */ +}; + +struct image_load { + const char *name; + int (*load_image)(struct m10bmc_sec *sec); }; static DEFINE_XARRAY_ALLOC(fw_upload_xa); @@ -137,6 +145,54 @@ DEVICE_ATTR_SEC_CSK_RO(pr, PR_PROG_ADDR + CSK_VEC_OFFSET); #define FLASH_COUNT_SIZE 4096 /* count stored as inverted bit vector */ +static ssize_t available_images_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct m10bmc_sec *sec = dev_get_drvdata(dev); + const struct image_load *hndlr; + ssize_t count = 0; + + for (hndlr = sec->image_load; hndlr->name; hndlr++) { + count += scnprintf(buf + count, PAGE_SIZE - count, + "%s ", hndlr->name); + } + + buf[count - 1] = '\n'; + + return count; +} +static DEVICE_ATTR_RO(available_images); + +static ssize_t image_load_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct m10bmc_sec *sec = dev_get_drvdata(dev); + const struct image_load *hndlr; + int ret = -EINVAL; + + for (hndlr = sec->image_load; hndlr->name; hndlr++) { + if (sysfs_streq(buf, hndlr->name)) { + ret = hndlr->load_image(sec); + break; + } + } + + return ret ? : count; +} +static DEVICE_ATTR_WO(image_load); + +static struct attribute *m10bmc_control_attrs[] = { + &dev_attr_available_images.attr, + &dev_attr_image_load.attr, + NULL, +}; + +static struct attribute_group m10bmc_control_attr_group = { + .name = "control", + .attrs = m10bmc_control_attrs, +}; + static ssize_t flash_count_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -195,6 +251,7 @@ static struct attribute_group m10bmc_security_attr_group = { static const struct attribute_group *m10bmc_sec_attr_groups[] = { &m10bmc_security_attr_group, + &m10bmc_control_attr_group, NULL, }; @@ -208,6 +265,53 @@ static void log_error_regs(struct m10bmc_sec *sec, u32 doorbell) dev_err(sec->dev, "RSU auth result: 0x%08x\n", auth_result); } +static int m10bmc_sec_bmc_image_load(struct m10bmc_sec *sec, + unsigned int val) +{ + u32 doorbell; + int ret; + + if (val > 1) { + dev_err(sec->dev, "invalid reload val = %d\n", val); + return -EINVAL; + } + + ret = m10bmc_sys_read(sec->m10bmc, M10BMC_DOORBELL, &doorbell); + if (ret) + return ret; + + if (doorbell & DRBL_REBOOT_DISABLED) + return -EBUSY; + + return regmap_update_bits(sec->m10bmc->regmap, + M10BMC_SYS_BASE + M10BMC_DOORBELL, + DRBL_CONFIG_SEL | DRBL_REBOOT_REQ, + FIELD_PREP(DRBL_CONFIG_SEL, val) | + DRBL_REBOOT_REQ); +} + +static int m10bmc_sec_bmc_image_load_0(struct m10bmc_sec *sec) +{ + return m10bmc_sec_bmc_image_load(sec, 0); +} + +static int m10bmc_sec_bmc_image_load_1(struct m10bmc_sec *sec) +{ + return m10bmc_sec_bmc_image_load(sec, 1); +} + +static struct image_load m10bmc_image_load_hndlrs[] = { + { + .name = "bmc_factory", + .load_image = m10bmc_sec_bmc_image_load_1, + }, + { + .name = "bmc_user", + .load_image = m10bmc_sec_bmc_image_load_0, + }, + {} +}; + static enum fw_upload_err rsu_check_idle(struct m10bmc_sec *sec) { u32 doorbell; @@ -565,6 +669,7 @@ static int m10bmc_sec_probe(struct platform_device *pdev) sec->dev = &pdev->dev; sec->m10bmc = dev_get_drvdata(pdev->dev.parent); dev_set_drvdata(&pdev->dev, sec); + sec->image_load = m10bmc_image_load_hndlrs; ret = xa_alloc(&fw_upload_xa, &sec->fw_name_id, sec, xa_limit_32b, GFP_KERNEL); -- 2.26.2