RE: [PATCH v3 1/2] fpga: m10bmc-sec: add sysfs to reload FPGA/BMC images

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Xu, Yilun <yilun.xu@xxxxxxxxx>
> Sent: Wednesday, August 10, 2022 12:31 AM
> To: matthew.gerlach@xxxxxxxxxxxxxxx
> Cc: Zhang, Tianfei <tianfei.zhang@xxxxxxxxx>; mdf@xxxxxxxxxx; linux-
> fpga@xxxxxxxxxxxxxxx; lee.jones@xxxxxxxxxx; Weight, Russell H
> <russell.h.weight@xxxxxxxxx>; Wu, Hao <hao.wu@xxxxxxxxx>; trix@xxxxxxxxxx
> Subject: Re: [PATCH v3 1/2] fpga: m10bmc-sec: add sysfs to reload FPGA/BMC
> images
> 
> On 2022-08-08 at 16:39:23 -0700, matthew.gerlach@xxxxxxxxxxxxxxx wrote:
> >
> >
> > On Mon, 8 Aug 2022, Xu Yilun wrote:
> >
> > > On 2022-08-08 at 01:33:16 -0400, Tianfei Zhang wrote:
> > > > From: Russ Weight <russell.h.weight@xxxxxxxxx>
> > > >
> > > > Add the available_images and image_load sysfs files. The
> > > > available_images file returns a space separated list of key words
> > > > that may be written into the image_load file. These keywords
> > > > describe an FPGA, BMC, or firmware image in FLASH or EEPROM storage that
> may be loaded.
> > > >
> > > > The image_load sysfs file may be written with a key word to
> > > > trigger a reload of an FPGA, BMC, or firmware image from FLASH or EEPROM.
> > > >
> > > > Signed-off-by: Russ Weight <russell.h.weight@xxxxxxxxx>
> > > > Signed-off-by: Tianfei Zhang <tianfei.zhang@xxxxxxxxx>
> > > > ---
> > > > v3:
> > > > uses regmap_update_bits() API instead of m10bmc_sys_update_bits().
> > > > v2:
> > > > adds the steps for how to use the "image_load" sysfs file.
> > > > ---
> > > >  .../sysfs-driver-intel-m10-bmc-sec-update     |  34 ++++++
> > > >  drivers/fpga/intel-m10-bmc-sec-update.c       | 105 ++++++++++++++++++
> > > >  2 files changed, 139 insertions(+)
> > > >
> > > > diff --git
> > > > a/Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-sec-update
> > > > b/Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-sec-update
> > > > index 0a41afe0ab4c..3d8f04ca6f1b 100644
> > > > ---
> > > > a/Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-sec-update
> > > > +++ b/Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-sec-upd
> > > > +++ ate
> > > > @@ -59,3 +59,37 @@ Contact:	Russ Weight <russell.h.weight@xxxxxxxxx>
> > > >  Description:	Read only. Returns number of times the secure update
> > > >  		staging area has been flashed.
> > > >  		Format: "%u".
> > > > +
> > > > +What:		/sys/bus/platform/drivers/intel-m10bmc-sec-
> update/.../control/available_images
> > > > +Date:		July 2022
> > > > +KernelVersion:  5.20
> > > > +Contact:	Russ Weight <russell.h.weight@xxxxxxxxx>
> > > > +Description:	Read-only. This file returns a space separated list of
> > > > +		key words that may be written into the image_load file
> > > > +		described below. These keywords decribe an FPGA, BMC,
> > > > +		or firmware image in FLASH or EEPROM storage that may
> > > > +		be loaded.
> > > > +
> > > > +What:		/sys/bus/platform/drivers/intel-m10bmc-sec-
> update/.../control/image_load
> > > > +Date:		July 2022
> > > > +KernelVersion:  5.20
> > > > +Contact:	Russ Weight <russell.h.weight@xxxxxxxxx>
> > > > +Description:	Write-only. A key word may be written to this file to
> > > > +		trigger a reload of an FPGA, BMC, or firmware image from
> > > > +		FLASH or EEPROM. Refer to the available_images file for a
> > > > +		list of supported key words for the underlying device.
> > > > +		Writing an unsupported string to this file will result in
> > > > +		EINVAL being returned.
> > > > +		It should remove all of resources related to the old FPGA/BMC
> > > > +		image before trigger the image reload otherwise the host system
> > > > +		may crash. We recommended that follow the below steps or
> > > > +directly
> > >
> > > I suggest we solve this concern first before other detailed refinements.
> > >
> > > My opinion is, don't make the sysfs interface dependent of other
> > > user interfaces, like the following:
> > >
> > > > +		use the OPAE RSU script to perform the reload for FPGA/BMC
> image.
> > > > +		Here is the steps to trigger the reload for FPGA/BMC image:
> > > > +		1. disable the AER of the FPGA card.
> > > > +		2. unbind the PFs/VFs which have bound with VFIO/UIO driver.
> > > > +		3. trigger image reload via "image_load" sysfs file.
> > > > +		4. remove all of PCI devices of the FPGA card via
> > > > +		"/sys/bus/pci/devices/xxxx/remove" sysfs file.
> > > > +		5. wait 10 seconds.
> > > > +		6. re-scan the PCI bus via "/sys/bus/pci/rescan" sysfs file.
> > > > +	        7. enable the AER of the FPGA card.
> > >
> > > It is not a good idea the writing of the sysfs node crashes the
> > > system, if we don't follow the whole steps.
> > >
> > > Thanks,
> > > Yilun
> >
> > Hi Yilun,
> >
> > The use case being supported with this trigger is the ability to
> > reconfigure a FPGA or other programmable componenet on a board without
> > the requiring the HW platform be able to power cycle a PCI slot or
> > power cycling the whole system.  Unfortunately, when a FPGA connected
> > to a PCI bus is reconfigured, it can cause a PCI error.  The actual
> > pci error, if any, and any mitigation steps to handle the error is
> > platform specific and dependent on the FPGA image itself.  Therefore
> > predicting and implementing all necessary error
> 
> Why the error handling is unpredictable?

This error came from BMC/FPGA while burn the image if we don't remove this PCI device.
For example, the other kernel thread accessing the FPGA/BMC while we burn the new FPGA image.

> 
> > mitigation in the kernel as part of the trigger would be an impossible task.
> 
> Or could we just gate the pcie link? Just like we should disable fpga bridges before
> reprogramming any fpga region.

I think PCI "remove" sysfs file will remove all of subdevices like fpga bridges, and process of  disable AER
and trigger PCI remove will mitigate  the PCI errors.

> 
> Actually I did find something for link disabing which may be useful.
> 
> https://patchwork.kernel.org/project/linux-pci/patch/20190529104942.74991-1-
> mika.westerberg@xxxxxxxxxxxxxxx/

It looks like this link_disable patch has not accepted by PCI maintainer.
This link_disable sysfs file just want to protect against another user doing rescan.

> 
> Thanks,
> Yilun
> 
> >
> > Matthew
> >
> > >
> > > > diff --git a/drivers/fpga/intel-m10-bmc-sec-update.c
> > > > b/drivers/fpga/intel-m10-bmc-sec-update.c
> > > > index 72c677c910de..3a082911cf67 100644
> > > > --- a/drivers/fpga/intel-m10-bmc-sec-update.c
> > > > +++ b/drivers/fpga/intel-m10-bmc-sec-update.c
> > > > @@ -14,6 +14,8 @@
> > > >  #include <linux/platform_device.h>  #include <linux/slab.h>
> > > >
> > > > +struct image_load;
> > > > +
> > > >  struct m10bmc_sec {
> > > >  	struct device *dev;
> > > >  	struct intel_m10bmc *m10bmc;
> > > > @@ -21,6 +23,12 @@ struct m10bmc_sec {
> > > >  	char *fw_name;
> > > >  	u32 fw_name_id;
> > > >  	bool cancel_request;
> > > > +	struct image_load *image_load;	/* terminated with { } member */
> > > > +};
> > > > +
> > > > +struct image_load {
> > > > +	const char *name;
> > > > +	int (*load_image)(struct m10bmc_sec *sec);
> > > >  };
> > > >
> > > >  static DEFINE_XARRAY_ALLOC(fw_upload_xa);
> > > > @@ -137,6 +145,54 @@ DEVICE_ATTR_SEC_CSK_RO(pr, PR_PROG_ADDR +
> > > > CSK_VEC_OFFSET);
> > > >
> > > >  #define FLASH_COUNT_SIZE 4096	/* count stored as inverted bit vector */
> > > >
> > > > +static ssize_t available_images_show(struct device *dev,
> > > > +				     struct device_attribute *attr, char *buf) {
> > > > +	struct m10bmc_sec *sec = dev_get_drvdata(dev);
> > > > +	const struct image_load *hndlr;
> > > > +	ssize_t count = 0;
> > > > +
> > > > +	for (hndlr = sec->image_load; hndlr->name; hndlr++) {
> > > > +		count += scnprintf(buf + count, PAGE_SIZE - count,
> > > > +				   "%s ", hndlr->name);
> > > > +	}
> > > > +
> > > > +	buf[count - 1] = '\n';
> > > > +
> > > > +	return count;
> > > > +}
> > > > +static DEVICE_ATTR_RO(available_images);
> > > > +
> > > > +static ssize_t image_load_store(struct device *dev,
> > > > +				struct device_attribute *attr,
> > > > +				const char *buf, size_t count) {
> > > > +	struct m10bmc_sec *sec = dev_get_drvdata(dev);
> > > > +	const struct image_load *hndlr;
> > > > +	int ret = -EINVAL;
> > > > +
> > > > +	for (hndlr = sec->image_load; hndlr->name; hndlr++) {
> > > > +		if (sysfs_streq(buf, hndlr->name)) {
> > > > +			ret = hndlr->load_image(sec);
> > > > +			break;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	return ret ? : count;
> > > > +}
> > > > +static DEVICE_ATTR_WO(image_load);
> > > > +
> > > > +static struct attribute *m10bmc_control_attrs[] = {
> > > > +	&dev_attr_available_images.attr,
> > > > +	&dev_attr_image_load.attr,
> > > > +	NULL,
> > > > +};
> > > > +
> > > > +static struct attribute_group m10bmc_control_attr_group = {
> > > > +	.name = "control",
> > > > +	.attrs = m10bmc_control_attrs,
> > > > +};
> > > > +
> > > >  static ssize_t flash_count_show(struct device *dev,
> > > >  				struct device_attribute *attr, char *buf)  { @@ -
> 195,6 +251,7
> > > > @@ static struct attribute_group m10bmc_security_attr_group = {
> > > >
> > > >  static const struct attribute_group *m10bmc_sec_attr_groups[] = {
> > > >  	&m10bmc_security_attr_group,
> > > > +	&m10bmc_control_attr_group,
> > > >  	NULL,
> > > >  };
> > > >
> > > > @@ -208,6 +265,53 @@ static void log_error_regs(struct m10bmc_sec *sec,
> u32 doorbell)
> > > >  		dev_err(sec->dev, "RSU auth result: 0x%08x\n", auth_result);  }
> > > >
> > > > +static int m10bmc_sec_bmc_image_load(struct m10bmc_sec *sec,
> > > > +				     unsigned int val)
> > > > +{
> > > > +	u32 doorbell;
> > > > +	int ret;
> > > > +
> > > > +	if (val > 1) {
> > > > +		dev_err(sec->dev, "invalid reload val = %d\n", val);
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > > +	ret = m10bmc_sys_read(sec->m10bmc, M10BMC_DOORBELL, &doorbell);
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	if (doorbell & DRBL_REBOOT_DISABLED)
> > > > +		return -EBUSY;
> > > > +
> > > > +	return regmap_update_bits(sec->m10bmc->regmap,
> > > > +				  M10BMC_SYS_BASE + M10BMC_DOORBELL,
> > > > +				  DRBL_CONFIG_SEL | DRBL_REBOOT_REQ,
> > > > +				  FIELD_PREP(DRBL_CONFIG_SEL, val) |
> > > > +				  DRBL_REBOOT_REQ);
> > > > +}
> > > > +
> > > > +static int m10bmc_sec_bmc_image_load_0(struct m10bmc_sec *sec) {
> > > > +	return m10bmc_sec_bmc_image_load(sec, 0); }
> > > > +
> > > > +static int m10bmc_sec_bmc_image_load_1(struct m10bmc_sec *sec) {
> > > > +	return m10bmc_sec_bmc_image_load(sec, 1); }
> > > > +
> > > > +static struct image_load m10bmc_image_load_hndlrs[] = {
> > > > +	{
> > > > +		.name = "bmc_factory",
> > > > +		.load_image = m10bmc_sec_bmc_image_load_1,
> > > > +	},
> > > > +	{
> > > > +		.name = "bmc_user",
> > > > +		.load_image = m10bmc_sec_bmc_image_load_0,
> > > > +	},
> > > > +	{}
> > > > +};
> > > > +
> > > >  static enum fw_upload_err rsu_check_idle(struct m10bmc_sec *sec)
> > > > {
> > > >  	u32 doorbell;
> > > > @@ -565,6 +669,7 @@ static int m10bmc_sec_probe(struct platform_device
> *pdev)
> > > >  	sec->dev = &pdev->dev;
> > > >  	sec->m10bmc = dev_get_drvdata(pdev->dev.parent);
> > > >  	dev_set_drvdata(&pdev->dev, sec);
> > > > +	sec->image_load = m10bmc_image_load_hndlrs;
> > > >
> > > >  	ret = xa_alloc(&fw_upload_xa, &sec->fw_name_id, sec,
> > > >  		       xa_limit_32b, GFP_KERNEL);
> > > > --
> > > > 2.26.2
> > > >
> > >




[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux