Re: [PATCH v2] hwmon: intel-m10-bmc-hwmon: add hwmon support for Intel MAX 10 BMC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/15/20 8:14 PM, Xu Yilun wrote:
> This patch adds hwmon functionality for Intel MAX 10 BMC chip. This BMC
> chip connects to a set of sensor chips to monitor current, voltage,
> thermal and power of different components on board. The BMC firmware is
> responsible for sensor data sampling and recording in shared registers.
> Host driver reads the sensor data from these shared registers and
> exposes them to users as hwmon interfaces.
> 
> Signed-off-by: Xu Yilun <yilun.xu@xxxxxxxxx>
> Signed-off-by: Wu Hao <hao.wu@xxxxxxxxx>
> Signed-off-by: Matthew Gerlach <matthew.gerlach@xxxxxxxxxxxxxxx>
> Signed-off-by: Tom Rix <trix@xxxxxxxxxx>

Is that really the Sign-off path, or are some of those reviewers ? Just wondering.

> ---
> v2: add the Documentation
>     refactor the code, provide static hwmon_channel_info
>     remove Unnecessary hwmon-sysfs.h
>     make the sensor data table const
> ---
>  Documentation/hwmon/index.rst               |   1 +
>  Documentation/hwmon/intel-m10-bmc-hwmon.rst |  78 ++++++
>  drivers/hwmon/Kconfig                       |  11 +
>  drivers/hwmon/Makefile                      |   1 +
>  drivers/hwmon/intel-m10-bmc-hwmon.c         | 355 ++++++++++++++++++++++++++++
>  5 files changed, 446 insertions(+)
>  create mode 100644 Documentation/hwmon/intel-m10-bmc-hwmon.rst
>  create mode 100644 drivers/hwmon/intel-m10-bmc-hwmon.c
> 
> diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
> index a926f1a..4bcb1a7 100644
> --- a/Documentation/hwmon/index.rst
> +++ b/Documentation/hwmon/index.rst
> @@ -74,6 +74,7 @@ Hardware Monitoring Kernel Drivers
>     ina209
>     ina2xx
>     ina3221
> +   intel-m10-bmc-hwmon
>     ir35221
>     ir38064
>     isl68137
> diff --git a/Documentation/hwmon/intel-m10-bmc-hwmon.rst b/Documentation/hwmon/intel-m10-bmc-hwmon.rst
> new file mode 100644
> index 0000000..3d148c6
> --- /dev/null
> +++ b/Documentation/hwmon/intel-m10-bmc-hwmon.rst
> @@ -0,0 +1,78 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Kernel driver intel-m10-bmc-hwmon
> +=================================
> +
> +Supported chips:
> +
> + * Intel MAX 10 BMC for Intel PAC N3000
> +
> +   Prefix: 'n3000bmc-hwmon'
> +
> +Author: Xu Yilun <yilun.xu@xxxxxxxxx>
> +
> +
> +Description
> +-----------
> +
> +This driver adds the temperature, voltage, current and power reading
> +support for the Intel MAX 10 Board Management Controller (BMC) chip.
> +The BMC chip is integrated in some Intel Programmable Acceleration
> +Cards (PAC). It connects to a set of sensor chips to monitor the
> +sensor data of different components on the board. The BMC firmware is
> +responsible for sensor data sampling and recording in shared
> +registers. The host driver reads the sensor data from these shared
> +registers and exposes them to users as hwmon interfaces.
> +
> +The BMC chip is implemented using the Intel MAX 10 CPLD. It could be
> +reprogramed to some variants in order to support different Intel
> +PACs. The driver is designed to be able to distinguish between the
> +variants, but now it only supports the BMC for Intel PAC N3000.
> +
> +
> +Sysfs attributes
> +----------------
> +
> +The following attributes are supported:
> +
> +- Intel MAX 10 BMC for Intel PAC N3000:
> +
> +======================= =======================================================
> +tempX_input             Temperature of the component (specified by tempX_label)
> +tempX_max               Temperature maximum setpoint of the component
> +tempX_crit              Temperature critical setpoint of the component
> +tempX_max_hyst          Hysteresis for temperature maximum of the component
> +tempX_crit_hyst         Hysteresis for temperature critical of the component
> +temp1_label             "Board Temperature"
> +temp2_label             "FPGA Die Temperature"
> +temp3_label             "QSFP0 Temperature"
> +temp4_label             "QSFP1 Temperature"
> +temp5_label             "Retimer A Temperature"
> +temp6_label             "Retimer A SerDes Temperature"
> +temp7_label             "Retimer B Temperature"
> +temp8_label             "Retimer B SerDes Temperature"
> +
> +inX_input               Measured voltage of the component (specified by
> +                        inX_label)
> +in0_label               "QSFP0 Supply Voltage"
> +in1_label               "QSFP1 Supply Voltage"
> +in2_label               "FPGA Core Voltage"
> +in3_label               "12V Backplane Voltage"
> +in4_label               "1.2V Voltage"
> +in5_label               "12V AUX Voltage"
> +in6_label               "1.8V Voltage"
> +in7_label               "3.3V Voltage"
> +
> +currX_input             Measured current of the component (specified by
> +                        currX_label)
> +curr1_label             "FPGA Core Current"
> +curr2_label             "12V Backplane Current"
> +curr3_label             "12V AUX Current"
> +
> +powerX_input            Measured power of the component (specified by
> +                        powerX_label)
> +power1_label            "Board Power"
> +
> +======================= =======================================================
> +
> +All the attributes are read-only.
> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> index 8dc28b2..53af15c 100644
> --- a/drivers/hwmon/Kconfig
> +++ b/drivers/hwmon/Kconfig
> @@ -2064,6 +2064,17 @@ config SENSORS_XGENE
>  	  If you say yes here you get support for the temperature
>  	  and power sensors for APM X-Gene SoC.
>  
> +config SENSORS_INTEL_M10_BMC_HWMON
> +	tristate "Intel MAX10 BMC Hardware Monitoring"
> +	depends on MFD_INTEL_M10_BMC
> +	help
> +	  This driver provides support for the hardware monitoring functionality
> +	  on Intel MAX10 BMC chip.
> +
> +	  This BMC Chip is used on Intel FPGA PCIe Acceleration Cards (PAC). Its
> +	  sensors monitor various telemetry data of different components on the
> +	  card, e.g. board temperature, FPGA core temperature/voltage/current.
> +
>  if ACPI
>  
>  comment "ACPI drivers"
> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> index a8f4b35..ba5a25a 100644
> --- a/drivers/hwmon/Makefile
> +++ b/drivers/hwmon/Makefile
> @@ -90,6 +90,7 @@ obj-$(CONFIG_SENSORS_IIO_HWMON) += iio_hwmon.o
>  obj-$(CONFIG_SENSORS_INA209)	+= ina209.o
>  obj-$(CONFIG_SENSORS_INA2XX)	+= ina2xx.o
>  obj-$(CONFIG_SENSORS_INA3221)	+= ina3221.o
> +obj-$(CONFIG_SENSORS_INTEL_M10_BMC_HWMON) += intel-m10-bmc-hwmon.o
>  obj-$(CONFIG_SENSORS_IT87)	+= it87.o
>  obj-$(CONFIG_SENSORS_JC42)	+= jc42.o
>  obj-$(CONFIG_SENSORS_K8TEMP)	+= k8temp.o
> diff --git a/drivers/hwmon/intel-m10-bmc-hwmon.c b/drivers/hwmon/intel-m10-bmc-hwmon.c
> new file mode 100644
> index 0000000..ce73545
> --- /dev/null
> +++ b/drivers/hwmon/intel-m10-bmc-hwmon.c
> @@ -0,0 +1,355 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Intel MAX 10 BMC HWMON Driver
> + *
> + * Copyright (C) 2018-2020 Intel Corporation. All rights reserved.
> + *
> + */
> +#include <linux/device.h>
> +#include <linux/hwmon.h>
> +#include <linux/mfd/intel-m10-bmc.h>
> +#include <linux/module.h>
> +#include <linux/mod_devicetable.h>
> +#include <linux/platform_device.h>
> +
> +struct m10bmc_sdata {
> +	unsigned int reg_input;
> +	unsigned int reg_max;
> +	unsigned int reg_crit;
> +	unsigned int reg_hyst;
> +	unsigned int reg_min;
> +	unsigned int multiplier;
> +	const char *label;
> +};
> +
> +struct m10bmc_hwmon_board_data {
> +	const struct m10bmc_sdata *temp;
> +	const struct m10bmc_sdata *in;
> +	const struct m10bmc_sdata *curr;
> +	const struct m10bmc_sdata *power;
> +	const struct hwmon_channel_info **hinfo;
> +};
> +
> +struct m10bmc_hwmon {
> +	struct device *dev;
> +	struct hwmon_chip_info chip;
> +	char *hw_name;
> +	struct intel_m10bmc *m10bmc;
> +	struct m10bmc_hwmon_board_data *bdata;
> +};
> +
> +static const struct m10bmc_sdata n3000bmc_temp_tbl[] = {
> +	{ 0x100, 0x104, 0x108, 0x10c, 0x0, 500, "Board Temperature" },
> +	{ 0x110, 0x114, 0x118, 0x0, 0x0, 500, "FPGA Die Temperature" },
> +	{ 0x11c, 0x124, 0x120, 0x0, 0x0, 500, "QSFP0 Temperature" },
> +	{ 0x12c, 0x134, 0x130, 0x0, 0x0, 500, "QSFP1 Temperature" },
> +	{ 0x168, 0x0, 0x0, 0x0, 0x0, 500, "Retimer A Temperature" },
> +	{ 0x16c, 0x0, 0x0, 0x0, 0x0, 500, "Retimer A SerDes Temperature" },
> +	{ 0x170, 0x0, 0x0, 0x0, 0x0, 500, "Retimer B Temperature" },
> +	{ 0x174, 0x0, 0x0, 0x0, 0x0, 500, "Retimer B SerDes Temperature" },
> +};
> +
> +static const struct m10bmc_sdata n3000bmc_in_tbl[] = {
> +	{ 0x128, 0x0, 0x0, 0x0, 0x0, 1, "QSFP0 Supply Voltage" },
> +	{ 0x138, 0x0, 0x0, 0x0, 0x0, 1, "QSFP1 Supply Voltage" },
> +	{ 0x13c, 0x0, 0x0, 0x0, 0x0, 1, "FPGA Core Voltage" },
> +	{ 0x144, 0x0, 0x0, 0x0, 0x0, 1, "12V Backplane Voltage" },
> +	{ 0x14c, 0x0, 0x0, 0x0, 0x0, 1, "1.2V Voltage" },
> +	{ 0x150, 0x0, 0x0, 0x0, 0x0, 1, "12V AUX Voltage" },
> +	{ 0x158, 0x0, 0x0, 0x0, 0x0, 1, "1.8V Voltage" },
> +	{ 0x15c, 0x0, 0x0, 0x0, 0x0, 1, "3.3V Voltage" },
> +};
> +
> +static const struct m10bmc_sdata n3000bmc_curr_tbl[] = {
> +	{ 0x140, 0x0, 0x0, 0x0, 0x0, 1, "FPGA Core Current" },
> +	{ 0x148, 0x0, 0x0, 0x0, 0x0, 1, "12V Backplane Current" },
> +	{ 0x154, 0x0, 0x0, 0x0, 0x0, 1, "12V AUX Current" },
> +};
> +
> +static const struct m10bmc_sdata n3000bmc_power_tbl[] = {
> +	{ 0x160, 0x0, 0x0, 0x0, 0x0, 1000, "Board Power" },> +};
> +
> +static const struct hwmon_channel_info *n3000bmc_hinfo[] = {
> +	HWMON_CHANNEL_INFO(temp,
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MAX_HYST |
> +			   HWMON_T_CRIT | HWMON_T_CRIT_HYST | HWMON_T_LABEL,
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT |
> +			   HWMON_T_LABEL,
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT |
> +			   HWMON_T_LABEL,
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT |
> +			   HWMON_T_LABEL,
> +			   HWMON_T_INPUT | HWMON_T_LABEL,
> +			   HWMON_T_INPUT | HWMON_T_LABEL,
> +			   HWMON_T_INPUT | HWMON_T_LABEL,
> +			   HWMON_T_INPUT | HWMON_T_LABEL),
> +	HWMON_CHANNEL_INFO(in,
> +			   HWMON_I_INPUT | HWMON_I_LABEL,
> +			   HWMON_I_INPUT | HWMON_I_LABEL,
> +			   HWMON_I_INPUT | HWMON_I_LABEL,
> +			   HWMON_I_INPUT | HWMON_I_LABEL,
> +			   HWMON_I_INPUT | HWMON_I_LABEL,
> +			   HWMON_I_INPUT | HWMON_I_LABEL,
> +			   HWMON_I_INPUT | HWMON_I_LABEL,
> +			   HWMON_I_INPUT | HWMON_I_LABEL),
> +	HWMON_CHANNEL_INFO(curr,
> +			   HWMON_C_INPUT | HWMON_C_LABEL,
> +			   HWMON_C_INPUT | HWMON_C_LABEL,
> +			   HWMON_C_INPUT | HWMON_C_LABEL),
> +	HWMON_CHANNEL_INFO(power,
> +			   HWMON_P_INPUT | HWMON_P_LABEL),
> +	NULL
> +};
> +
> +struct m10bmc_hwmon_board_data n3000bmc_hwmon_bdata = {

static, and probably const

> +	.temp = n3000bmc_temp_tbl,
> +	.in = n3000bmc_in_tbl,
> +	.curr = n3000bmc_curr_tbl,
> +	.power = n3000bmc_power_tbl,

I would suggest to declare
	const struct m10bmc_sdata *tables[hwmon_max];
and initialize with
	.tables[hwmon_temp] = n3000bmc_temp_tbl,
	.tables[hwmon_in] = n3000bmc_in_tbl,
and so on. That makes the structure a bit larger, but simplifies
the operational code (see below).

> +	.hinfo = n3000bmc_hinfo,
> +};
> +
> +static umode_t
> +m10bmc_hwmon_is_visible(const void *data, enum hwmon_sensor_types type,
> +			u32 attr, int channel)
> +{
> +	return 0444;
> +}
> +
> +static const struct m10bmc_sdata *
> +find_sensor_data(struct m10bmc_hwmon *hw, enum hwmon_sensor_types type,
> +		 int channel)
> +{
> +	const struct m10bmc_sdata *tbl;
> +
> +	switch (type) {
> +	case hwmon_temp:
> +		tbl = hw->bdata->temp;
> +		break;
> +	case hwmon_in:
> +		tbl = hw->bdata->in;
> +		break;
> +	case hwmon_curr:
> +		tbl = hw->bdata->curr;
> +		break;
> +	case hwmon_power:
> +		tbl = hw->bdata->power;
> +		break;
> +	default:
> +		return ERR_PTR(-EOPNOTSUPP);
> +	}

Then you can use
	tbl = hw->bdata->tables[type];

> +
> +	if (!tbl)
> +		return ERR_PTR(-EOPNOTSUPP);
> +
> +	return &tbl[channel];
> +}
> +
> +static int do_sensor_read(struct m10bmc_hwmon *hw,
> +			  const struct m10bmc_sdata *data,
> +			  unsigned int regoff, long *val)
> +{
> +	unsigned int regval;
> +	int ret;
> +
> +	ret = m10bmc_sys_read(hw->m10bmc, regoff, &regval);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * BMC Firmware will return 0xdeadbeef if the sensor value is invalid
> +	 * at that time. This usually happens on sensor channels which connect
> +	 * to external pluggable modules, e.g. QSFP temperature and voltage.
> +	 * When the QSFP is unplugged from cage, driver will get 0xdeadbeef
> +	 * from their registers.
> +	 */
> +	if (regval == 0xdeadbeef)
> +		return -EBUSY;

Is this appropriate ? The description above would suggest differently.
-ENODATA, maybe ? Also, are those module hot pluggable ? If not,
it may be more appropriate to detect the status in the in_visible function
and not create affected attributes in the first place.

> +
> +	*val = regval * data->multiplier;
> +
> +	return 0;
> +}
> +
> +static int m10bmc_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
> +			     u32 attr, int channel, long *val)
> +{
> +	struct m10bmc_hwmon *hw = dev_get_drvdata(dev);
> +	unsigned int reg = 0, reg_hyst = 0;
> +	const struct m10bmc_sdata *data;
> +	long hyst, value;
> +	int ret;
> +
> +	data = find_sensor_data(hw, type, channel);
> +	if (IS_ERR(data))
> +		return PTR_ERR(data);
> +
> +	switch (type) {
> +	case hwmon_temp:
> +		switch (attr) {
> +		case hwmon_temp_input:
> +			reg = data->reg_input;
> +			break;
> +		case hwmon_temp_max_hyst:
> +			reg_hyst = data->reg_hyst;
> +			fallthrough;
> +		case hwmon_temp_max:
> +			reg = data->reg_max;
> +			break;
> +		case hwmon_temp_crit_hyst:
> +			reg_hyst = data->reg_hyst;
> +			fallthrough;
> +		case hwmon_temp_crit:
> +			reg = data->reg_crit;
> +			break;
> +		default:
> +			return -EOPNOTSUPP;
> +		}
> +		break;
> +	case hwmon_in:
> +		switch (attr) {
> +		case hwmon_in_input:
> +			reg = data->reg_input;
> +			break;
> +		case hwmon_in_max:
> +			reg = data->reg_max;
> +			break;
> +		case hwmon_in_crit:
> +			reg = data->reg_crit;
> +			break;
> +		case hwmon_in_min:
> +			reg = data->reg_min;
> +			break;
> +		default:
> +			return -EOPNOTSUPP;
> +		}
> +		break;
> +	case hwmon_curr:
> +		switch (attr) {
> +		case hwmon_curr_input:
> +			reg = data->reg_input;
> +			break;
> +		case hwmon_curr_max:
> +			reg = data->reg_max;
> +			break;
> +		case hwmon_curr_crit:
> +			reg = data->reg_crit;
> +			break;
> +		default:
> +			return -EOPNOTSUPP;
> +		}
> +		break;
> +	case hwmon_power:
> +		switch (attr) {
> +		case hwmon_power_input:
> +			reg = data->reg_input;
> +			break;
> +		default:
> +			return -EOPNOTSUPP;
> +		}
> +		break;
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +
> +	if (!reg)
> +		return -EOPNOTSUPP;
> +

That can't happen with the code above. reg is always initialized.
All other cases already returned -EOPNOTSUPP.

> +	ret = do_sensor_read(hw, data, reg, &value);
> +	if (ret)
> +		return ret;
> +
> +	if (reg_hyst) {
> +		ret = do_sensor_read(hw, data, reg_hyst, &hyst);
> +		if (ret)
> +			return ret;
> +
> +		value -= hyst;

Is that also correct for _min attributes ? Normally I'd assume that the hysteresis
for those is larger than the base attribute value.

> +	}
> +
> +	*val = value;
> +
> +	return ret;

ret is always 0 here -> return 0;

> +}
> +
> +static int m10bmc_hwmon_read_string(struct device *dev,
> +				    enum hwmon_sensor_types type,
> +				    u32 attr, int channel, const char **str)
> +{
> +	struct m10bmc_hwmon *hw = dev_get_drvdata(dev);
> +	const struct m10bmc_sdata *data;
> +
> +	data = find_sensor_data(hw, type, channel);
> +	if (IS_ERR(data))
> +		return PTR_ERR(data);
> +
> +	*str = data->label;
> +
> +	return 0;
> +}
> +
> +static const struct hwmon_ops m10bmc_hwmon_ops = {
> +	.is_visible = m10bmc_hwmon_is_visible,
> +	.read = m10bmc_hwmon_read,
> +	.read_string = m10bmc_hwmon_read_string,
> +};
> +
> +static int m10bmc_hwmon_probe(struct platform_device *pdev)
> +{
> +	const struct platform_device_id *id = platform_get_device_id(pdev);
> +	struct intel_m10bmc *m10bmc = dev_get_drvdata(pdev->dev.parent);
> +	struct device *hwmon_dev, *dev = &pdev->dev;
> +	struct m10bmc_hwmon *hw;
> +	int i;
> +
> +	if (!id || !id->driver_data) {

That can not really happen.

> +		dev_err(dev, "Failed to get board data\n");
> +		return -ENODEV;
> +	}
> +
> +	hw = devm_kzalloc(dev, sizeof(*hw), GFP_KERNEL);
> +	if (!hw)
> +		return -ENOMEM;
> +
> +	hw->dev = dev;
> +	hw->m10bmc = m10bmc;
> +	hw->bdata = (struct m10bmc_hwmon_board_data *)id->driver_data;
> +
> +	hw->chip.info = hw->bdata->hinfo;
> +	hw->chip.ops = &m10bmc_hwmon_ops;
> +
> +	hw->hw_name = devm_kstrdup(dev, id->name, GFP_KERNEL);
> +	if (!hw->hw_name)
> +		return -ENOMEM;
> +
> +	for (i = 0; hw->hw_name[i]; i++)
> +		if (hwmon_is_bad_char(hw->hw_name[i]))
> +			hw->hw_name[i] = '_';
> +
> +	hwmon_dev = devm_hwmon_device_register_with_info(dev, hw->hw_name,
> +							 hw, &hw->chip, NULL);
> +	return PTR_ERR_OR_ZERO(hwmon_dev);
> +}
> +
> +static const struct platform_device_id intel_m10bmc_hwmon_ids[] = {
> +	{
> +		.name = "n3000bmc-hwmon",
> +		.driver_data = (unsigned long)&n3000bmc_hwmon_bdata,
> +	},
> +	{ }
> +};
> +
> +static struct platform_driver intel_m10bmc_hwmon_driver = {
> +	.probe = m10bmc_hwmon_probe,
> +	.driver = {
> +		.name = "intel-m10-bmc-hwmon",
> +	},
> +	.id_table = intel_m10bmc_hwmon_ids,
> +};
> +module_platform_driver(intel_m10bmc_hwmon_driver);
> +
> +MODULE_DEVICE_TABLE(platform, intel_m10bmc_hwmon_ids);
> +MODULE_AUTHOR("Intel Corporation");
> +MODULE_DESCRIPTION("Intel MAX 10 BMC hardware monitor");
> +MODULE_LICENSE("GPL");
> 




[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux