RE: [PATCH v3 1/2] fpga: m10bmc-sec: add sysfs to reload FPGA/BMC images

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: matthew.gerlach@xxxxxxxxxxxxxxx <matthew.gerlach@xxxxxxxxxxxxxxx>
> Sent: Tuesday, August 9, 2022 7:39 AM
> To: Xu, Yilun <yilun.xu@xxxxxxxxx>
> Cc: Zhang, Tianfei <tianfei.zhang@xxxxxxxxx>; mdf@xxxxxxxxxx; linux-
> fpga@xxxxxxxxxxxxxxx; lee.jones@xxxxxxxxxx; Weight, Russell H
> <russell.h.weight@xxxxxxxxx>; Wu, Hao <hao.wu@xxxxxxxxx>; trix@xxxxxxxxxx
> Subject: Re: [PATCH v3 1/2] fpga: m10bmc-sec: add sysfs to reload FPGA/BMC
> images
> 
> 
> 
> On Mon, 8 Aug 2022, Xu Yilun wrote:
> 
> > On 2022-08-08 at 01:33:16 -0400, Tianfei Zhang wrote:
> >> From: Russ Weight <russell.h.weight@xxxxxxxxx>
> >>
> >> Add the available_images and image_load sysfs files. The
> >> available_images file returns a space separated list of key words
> >> that may be written into the image_load file. These keywords describe
> >> an FPGA, BMC, or firmware image in FLASH or EEPROM storage that may be
> loaded.
> >>
> >> The image_load sysfs file may be written with a key word to trigger a
> >> reload of an FPGA, BMC, or firmware image from FLASH or EEPROM.
> >>
> >> Signed-off-by: Russ Weight <russell.h.weight@xxxxxxxxx>
> >> Signed-off-by: Tianfei Zhang <tianfei.zhang@xxxxxxxxx>
> >> ---
> >> v3:
> >> uses regmap_update_bits() API instead of m10bmc_sys_update_bits().
> >> v2:
> >> adds the steps for how to use the "image_load" sysfs file.
> >> ---
> >>  .../sysfs-driver-intel-m10-bmc-sec-update     |  34 ++++++
> >>  drivers/fpga/intel-m10-bmc-sec-update.c       | 105 ++++++++++++++++++
> >>  2 files changed, 139 insertions(+)
> >>
> >> diff --git
> >> a/Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-sec-update
> >> b/Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-sec-update
> >> index 0a41afe0ab4c..3d8f04ca6f1b 100644
> >> --- a/Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-sec-update
> >> +++ b/Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-sec-update
> >> @@ -59,3 +59,37 @@ Contact:	Russ Weight <russell.h.weight@xxxxxxxxx>
> >>  Description:	Read only. Returns number of times the secure update
> >>  		staging area has been flashed.
> >>  		Format: "%u".
> >> +
> >> +What:		/sys/bus/platform/drivers/intel-m10bmc-sec-
> update/.../control/available_images
> >> +Date:		July 2022
> >> +KernelVersion:  5.20
> >> +Contact:	Russ Weight <russell.h.weight@xxxxxxxxx>
> >> +Description:	Read-only. This file returns a space separated list of
> >> +		key words that may be written into the image_load file
> >> +		described below. These keywords decribe an FPGA, BMC,
> >> +		or firmware image in FLASH or EEPROM storage that may
> >> +		be loaded.
> >> +
> >> +What:		/sys/bus/platform/drivers/intel-m10bmc-sec-
> update/.../control/image_load
> >> +Date:		July 2022
> >> +KernelVersion:  5.20
> >> +Contact:	Russ Weight <russell.h.weight@xxxxxxxxx>
> >> +Description:	Write-only. A key word may be written to this file to
> >> +		trigger a reload of an FPGA, BMC, or firmware image from
> >> +		FLASH or EEPROM. Refer to the available_images file for a
> >> +		list of supported key words for the underlying device.
> >> +		Writing an unsupported string to this file will result in
> >> +		EINVAL being returned.
> >> +		It should remove all of resources related to the old FPGA/BMC
> >> +		image before trigger the image reload otherwise the host system
> >> +		may crash. We recommended that follow the below steps or
> directly
> >
> > I suggest we solve this concern first before other detailed refinements.

I have run the stress test on N3000 and N6000 card by only directly write string to " image_load"  sysfs file, 
and observed  that there are no Host System crash occurred, but the BMC functionality failed, I think this is 
the expected behavior  that is why we strongly recommended that end-user use our RSU script or follow our 
provided steps.

> >
> > My opinion is, don't make the sysfs interface dependent of other user
> > interfaces, like the following:
> >
> >> +		use the OPAE RSU script to perform the reload for FPGA/BMC
> image.
> >> +		Here is the steps to trigger the reload for FPGA/BMC image:
> >> +		1. disable the AER of the FPGA card.
> >> +		2. unbind the PFs/VFs which have bound with VFIO/UIO driver.
> >> +		3. trigger image reload via "image_load" sysfs file.
> >> +		4. remove all of PCI devices of the FPGA card via
> >> +		"/sys/bus/pci/devices/xxxx/remove" sysfs file.
> >> +		5. wait 10 seconds.
> >> +		6. re-scan the PCI bus via "/sys/bus/pci/rescan" sysfs file.
> >> +	        7. enable the AER of the FPGA card.
> >
> > It is not a good idea the writing of the sysfs node crashes the
> > system, if we don't follow the whole steps.
> >
> > Thanks,
> > Yilun
> 
> Hi Yilun,
> 
> The use case being supported with this trigger is the ability to reconfigure a FPGA or
> other programmable componenet on a board without the requiring the HW platform
> be able to power cycle a PCI slot or power cycling the whole system.  Unfortunately,
> when a FPGA connected to a PCI bus is reconfigured, it can cause a PCI error.  The
> actual pci error, if any, and any mitigation steps to handle the error is platform
> specific and dependent on the FPGA image itself.  Therefore predicting and
> implementing all necessary error mitigation in the kernel as part of the trigger would
> be an impossible task.

I agree with Matthew's comment.
How about we change the documentation as below:

It should remove all of resources related to the old FPGA/BMC
image before trigger the image reload otherwise the functionalities 
of FPGA/BMC will fail. We recommended that follow the below steps 
or directly use the OPAE RSU script to perform the reload for FPGA/BMC
image.
Here is the steps to trigger the reload for FPGA/BMC image:
1. disable the AER of the FPGA card.
2. unbind the PFs/VFs which have bound with VFIO/UIO driver.
3. trigger image reload via "image_load" sysfs file.
4. remove all of PCI devices of the FPGA card via
"/sys/bus/pci/devices/xxxx/remove" sysfs file.
5. wait 10 seconds.
6. re-scan the PCI bus via "/sys/bus/pci/rescan" sysfs file.
7. enable the AER of the FPGA card.



> 
> Matthew
> 
> >
> >> diff --git a/drivers/fpga/intel-m10-bmc-sec-update.c b/drivers/fpga/intel-m10-
> bmc-sec-update.c
> >> index 72c677c910de..3a082911cf67 100644
> >> --- a/drivers/fpga/intel-m10-bmc-sec-update.c
> >> +++ b/drivers/fpga/intel-m10-bmc-sec-update.c
> >> @@ -14,6 +14,8 @@
> >>  #include <linux/platform_device.h>
> >>  #include <linux/slab.h>
> >>
> >> +struct image_load;
> >> +
> >>  struct m10bmc_sec {
> >>  	struct device *dev;
> >>  	struct intel_m10bmc *m10bmc;
> >> @@ -21,6 +23,12 @@ struct m10bmc_sec {
> >>  	char *fw_name;
> >>  	u32 fw_name_id;
> >>  	bool cancel_request;
> >> +	struct image_load *image_load;	/* terminated with { } member */
> >> +};
> >> +
> >> +struct image_load {
> >> +	const char *name;
> >> +	int (*load_image)(struct m10bmc_sec *sec);
> >>  };
> >>
> >>  static DEFINE_XARRAY_ALLOC(fw_upload_xa);
> >> @@ -137,6 +145,54 @@ DEVICE_ATTR_SEC_CSK_RO(pr, PR_PROG_ADDR +
> CSK_VEC_OFFSET);
> >>
> >>  #define FLASH_COUNT_SIZE 4096	/* count stored as inverted bit vector */
> >>
> >> +static ssize_t available_images_show(struct device *dev,
> >> +				     struct device_attribute *attr, char *buf)
> >> +{
> >> +	struct m10bmc_sec *sec = dev_get_drvdata(dev);
> >> +	const struct image_load *hndlr;
> >> +	ssize_t count = 0;
> >> +
> >> +	for (hndlr = sec->image_load; hndlr->name; hndlr++) {
> >> +		count += scnprintf(buf + count, PAGE_SIZE - count,
> >> +				   "%s ", hndlr->name);
> >> +	}
> >> +
> >> +	buf[count - 1] = '\n';
> >> +
> >> +	return count;
> >> +}
> >> +static DEVICE_ATTR_RO(available_images);
> >> +
> >> +static ssize_t image_load_store(struct device *dev,
> >> +				struct device_attribute *attr,
> >> +				const char *buf, size_t count)
> >> +{
> >> +	struct m10bmc_sec *sec = dev_get_drvdata(dev);
> >> +	const struct image_load *hndlr;
> >> +	int ret = -EINVAL;
> >> +
> >> +	for (hndlr = sec->image_load; hndlr->name; hndlr++) {
> >> +		if (sysfs_streq(buf, hndlr->name)) {
> >> +			ret = hndlr->load_image(sec);
> >> +			break;
> >> +		}
> >> +	}
> >> +
> >> +	return ret ? : count;
> >> +}
> >> +static DEVICE_ATTR_WO(image_load);
> >> +
> >> +static struct attribute *m10bmc_control_attrs[] = {
> >> +	&dev_attr_available_images.attr,
> >> +	&dev_attr_image_load.attr,
> >> +	NULL,
> >> +};
> >> +
> >> +static struct attribute_group m10bmc_control_attr_group = {
> >> +	.name = "control",
> >> +	.attrs = m10bmc_control_attrs,
> >> +};
> >> +
> >>  static ssize_t flash_count_show(struct device *dev,
> >>  				struct device_attribute *attr, char *buf)
> >>  {
> >> @@ -195,6 +251,7 @@ static struct attribute_group
> m10bmc_security_attr_group = {
> >>
> >>  static const struct attribute_group *m10bmc_sec_attr_groups[] = {
> >>  	&m10bmc_security_attr_group,
> >> +	&m10bmc_control_attr_group,
> >>  	NULL,
> >>  };
> >>
> >> @@ -208,6 +265,53 @@ static void log_error_regs(struct m10bmc_sec *sec, u32
> doorbell)
> >>  		dev_err(sec->dev, "RSU auth result: 0x%08x\n", auth_result);
> >>  }
> >>
> >> +static int m10bmc_sec_bmc_image_load(struct m10bmc_sec *sec,
> >> +				     unsigned int val)
> >> +{
> >> +	u32 doorbell;
> >> +	int ret;
> >> +
> >> +	if (val > 1) {
> >> +		dev_err(sec->dev, "invalid reload val = %d\n", val);
> >> +		return -EINVAL;
> >> +	}
> >> +
> >> +	ret = m10bmc_sys_read(sec->m10bmc, M10BMC_DOORBELL, &doorbell);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	if (doorbell & DRBL_REBOOT_DISABLED)
> >> +		return -EBUSY;
> >> +
> >> +	return regmap_update_bits(sec->m10bmc->regmap,
> >> +				  M10BMC_SYS_BASE + M10BMC_DOORBELL,
> >> +				  DRBL_CONFIG_SEL | DRBL_REBOOT_REQ,
> >> +				  FIELD_PREP(DRBL_CONFIG_SEL, val) |
> >> +				  DRBL_REBOOT_REQ);
> >> +}
> >> +
> >> +static int m10bmc_sec_bmc_image_load_0(struct m10bmc_sec *sec)
> >> +{
> >> +	return m10bmc_sec_bmc_image_load(sec, 0);
> >> +}
> >> +
> >> +static int m10bmc_sec_bmc_image_load_1(struct m10bmc_sec *sec)
> >> +{
> >> +	return m10bmc_sec_bmc_image_load(sec, 1);
> >> +}
> >> +
> >> +static struct image_load m10bmc_image_load_hndlrs[] = {
> >> +	{
> >> +		.name = "bmc_factory",
> >> +		.load_image = m10bmc_sec_bmc_image_load_1,
> >> +	},
> >> +	{
> >> +		.name = "bmc_user",
> >> +		.load_image = m10bmc_sec_bmc_image_load_0,
> >> +	},
> >> +	{}
> >> +};
> >> +
> >>  static enum fw_upload_err rsu_check_idle(struct m10bmc_sec *sec)
> >>  {
> >>  	u32 doorbell;
> >> @@ -565,6 +669,7 @@ static int m10bmc_sec_probe(struct platform_device
> *pdev)
> >>  	sec->dev = &pdev->dev;
> >>  	sec->m10bmc = dev_get_drvdata(pdev->dev.parent);
> >>  	dev_set_drvdata(&pdev->dev, sec);
> >> +	sec->image_load = m10bmc_image_load_hndlrs;
> >>
> >>  	ret = xa_alloc(&fw_upload_xa, &sec->fw_name_id, sec,
> >>  		       xa_limit_32b, GFP_KERNEL);
> >> --
> >> 2.26.2
> >>
> >




[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux