RE: [RFC PATCH 2/2] PCI:hip08:Add driver to handle HiSilicon hip08 PCIe controller's errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Bjorn,

Thanks for reviewing the patch.
Please find reply inline.

>-----Original Message-----
>From: Bjorn Helgaas [mailto:helgaas@xxxxxxxxxx]
>Sent: 15 January 2020 14:14
>To: Shiju Jose <shiju.jose@xxxxxxxxxx>
>Cc: linux-acpi@xxxxxxxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; linux-
>kernel@xxxxxxxxxxxxxxx; rjw@xxxxxxxxxxxxx; lenb@xxxxxxxxxx; bp@xxxxxxxxx;
>james.morse@xxxxxxx; tony.luck@xxxxxxxxx; gregkh@xxxxxxxxxxxxxxxxxxx;
>zhangliguang@xxxxxxxxxxxxxxxxx; tglx@xxxxxxxxxxxxx; Linuxarm
><linuxarm@xxxxxxxxxx>; Jonathan Cameron
><jonathan.cameron@xxxxxxxxxx>; tanxiaofei <tanxiaofei@xxxxxxxxxx>;
>yangyicong <yangyicong@xxxxxxxxxx>
>Subject: Re: [RFC PATCH 2/2] PCI:hip08:Add driver to handle HiSilicon hip08
>PCIe controller's errors
>
>Follow convention for subject line.

Ok. We will ix it.

>
>On Wed, Jan 15, 2020 at 11:01:40AM +0000, Shiju Jose wrote:
>> From: Yicong Yang <yangyicong@xxxxxxxxxxxxx>
>>
>> The hip08 error handle driver logs and reports PCIe controller's
>> recoverable errors.
>> Perform root port reset and restore link status for the recovery.
>>
>> Following are some of the PCIe controller's recoverable errors 1.
>> completion transmission timeout error.
>> 2. CRS retry counter over the threshold error.
>> 3. ECC 2 bit errors
>> 4. AXI bresponse/rresponse errors etc.
>>
>> RFC: The appropriate location for this driver may be discussed.
>>
>> Signed-off-by: Yicong Yang <yangyicong@xxxxxxxxxxxxx>
>> Signed-off-by: Shiju Jose <shiju.jose@xxxxxxxxxx>
>> ---
>>  drivers/pci/controller/Kconfig                 |   8 +
>>  drivers/pci/controller/Makefile                |   1 +
>>  drivers/pci/controller/pcie-hisi-hip08-error.c | 323
>> +++++++++++++++++++++++++
>
>Seems like this driver and its Kconfig should be near
>drivers/pci/controller/dwc/pcie-hisi.c.

pcie-hisi.c was for our old hip* devices.
Our hip08 PCIe controller doesn't use DWC ip.
Thus the driver to be in /drivers/pci/controller/ ?

>
>>  3 files changed, 332 insertions(+)
>>  create mode 100644 drivers/pci/controller/pcie-hisi-hip08-error.c
>>
>> diff --git a/drivers/pci/controller/Kconfig
>> b/drivers/pci/controller/Kconfig index c77069c..0ee99b8 100644
>> --- a/drivers/pci/controller/Kconfig
>> +++ b/drivers/pci/controller/Kconfig
>> @@ -260,6 +260,14 @@ config PCI_HYPERV_INTERFACE
>>  	  The Hyper-V PCI Interface is a helper driver allows other drivers to
>>  	  have a common interface with the Hyper-V PCI frontend driver.
>>
>> +config PCIE_HISI_HIP08_ERR_HANDLER
>
>Config symbol is too long, maybe CONFIG_PCI_HISI_ERR or similar (to be
>parallel with existing CONFIG_PCI_HISI).  Both should probably be
>"CONFIG_PCIE", not "CONFIG_PCI".  I can't remember why CONFIG_PCI_HISI is
>that way.

Ok. We will change it to CONFIG_PCI_HISI_ERR.

>
>> +	depends on ARM64 || COMPILE_TEST
>> +	depends on (ACPI && PCI_QUIRKS)
>> +	bool "HiSilicon hip08 Soc PCIe local error handling driver"
>> +	help
>> +	  Say Y here if you want PCIe error handling support
>> +	  for the PCIe local(controller) errors on HiSilicon hip08 SoC
>> +
>>  source "drivers/pci/controller/dwc/Kconfig"
>>  source "drivers/pci/controller/cadence/Kconfig"
>>  endmenu
>> diff --git a/drivers/pci/controller/Makefile
>> b/drivers/pci/controller/Makefile index 3d4f597..ac9852f 100644
>> --- a/drivers/pci/controller/Makefile
>> +++ b/drivers/pci/controller/Makefile
>> @@ -28,6 +28,7 @@ obj-$(CONFIG_PCIE_MEDIATEK) += pcie-mediatek.o
>>  obj-$(CONFIG_PCIE_MOBIVEIL) += pcie-mobiveil.o
>>  obj-$(CONFIG_PCIE_TANGO_SMP8759) += pcie-tango.o
>>  obj-$(CONFIG_VMD) += vmd.o
>> +obj-$(CONFIG_PCIE_HISI_HIP08_ERR_HANDLER) += pcie-hisi-hip08-error.o
>>  # pcie-hisi.o quirks are needed even without CONFIG_PCIE_DW
>>  obj-y				+= dwc/
>>
>> diff --git a/drivers/pci/controller/pcie-hisi-hip08-error.c
>> b/drivers/pci/controller/pcie-hisi-hip08-error.c
>> new file mode 100644
>> index 0000000..6f5d002
>> --- /dev/null
>> +++ b/drivers/pci/controller/pcie-hisi-hip08-error.c
>> @@ -0,0 +1,323 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * PCIe driver for handling PCIe local errors occurred on
>> + * HiSilicon hip08 PCIe controller.
>
>"PCIe" occurs too many times in this sentence.  Strictly speaking this is not a
>"PCIe driver"; it's a driver for an ACPI device that reports hip08-related errors.

Ok. Will remove it.

>Hopefully we don't need a separate driver for every hip* device, so maybe the
>"hip08" name is too specific.

Currently we are using this single driver for all hip08 device. so maybe this is better:
yes, a separate driver is unnecessary. We'll remove "hip08" name and
make it more generic for hip* device.

>
>> + * Copyright (c) 2018-2019 Hisilicon Limited.
>
>Capitalize "HiSilicon" consistently.  Also "hip08"; previous practice in
>drivers/pci is "Hip05", "Hip06", so use that unless HiSilicon itself does it
>differently.

Ok. We will fix.

>
>> +#include <linux/acpi.h>
>> +#include <acpi/ghes.h>
>> +#include <linux/bitfield.h>
>> +#include <linux/bitops.h>
>> +#include <linux/delay.h>
>> +#include <linux/irq.h>
>> +#include <linux/irqdomain.h>
>> +#include <linux/list.h>
>> +#include <linux/pci.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/resource.h>
>> +#include <linux/kfifo.h>
>> +#include <linux/spinlock.h>
>> +
>> +#include "../pci.h"
>> +
>> +#define HISI_PCIE_ERR_RECOVER_RING_SIZE           16
>> +#define	HISI_PCIE_ERR_INFO_SIZE	1024
>> +
>> +/* HISI PCIe Local error definitions */
>> +#define HISI_PCIE_ERR_MISC_REGS	33
>> +
>> +#define HISI_PCIE_SUB_MODULE_ID_AP	0
>> +#define HISI_PCIE_SUB_MODULE_ID_TL	1
>> +#define HISI_PCIE_SUB_MODULE_ID_MAC	2
>> +#define HISI_PCIE_SUB_MODULE_ID_DL	3
>> +#define HISI_PCIE_SUB_MODULE_ID_SDI	4
>> +
>> +#define HISI_PCIE_LOCAL_VALID_VERSION		BIT(0)
>> +#define HISI_PCIE_LOCAL_VALID_SOC_ID		BIT(1)
>> +#define HISI_PCIE_LOCAL_VALID_SOCKET_ID		BIT(2)
>> +#define HISI_PCIE_LOCAL_VALID_NIMBUS_ID		BIT(3)
>> +#define HISI_PCIE_LOCAL_VALID_SUB_MODULE_ID	BIT(4)
>> +#define HISI_PCIE_LOCAL_VALID_CORE_ID		BIT(5)
>> +#define HISI_PCIE_LOCAL_VALID_PORT_ID		BIT(6)
>> +#define HISI_PCIE_LOCAL_VALID_ERR_TYPE		BIT(7)
>> +#define HISI_PCIE_LOCAL_VALID_ERR_SEVERITY	BIT(8)
>> +#define HISI_PCIE_LOCAL_VALID_ERR_MISC		9
>> +
>> +#define HISI_ERR_SEV_RECOVERABLE	0
>> +#define HISI_ERR_SEV_FATAL		1
>> +#define HISI_ERR_SEV_CORRECTED		2
>> +#define HISI_ERR_SEV_NONE		3
>> +
>> +guid_t hip08_pcie_sec_type = GUID_INIT(0xB2889FC9, 0xE7D7, 0x4F9D,
>0xA8, 0x67,
>> +				       0xAF, 0x42, 0xE9, 0x8B, 0xE7, 0x72);
>> +
>> +#define HISI_PCIE_CORE_ID(v)             ((v) >> 3)
>> +#define HISI_PCIE_PORT_ID(core, v)       ((v >> 1) + (core << 3))
>> +#define HISI_PCIE_CORE_PORT_ID(v)        ((v % 8) << 1)
>> +#define HISI_PCIE_ROOT_BUSNR(v)          ((v) ? 0x80 : 0)
>
>Is the root bus number really hard-wired in the chip?  You're saying the only
>possible root bus numbers are 0x80 and 0x00?  Typically this bus number is
>programmable.

We will fix it and remove the macro. We'll get the root bus number from acpi instead.

>
>Why parens around "v" (sometimes) but not others and "core"?

We will correct it.

>
>> +struct hisi_pcie_local_err_data {
>> +	uint64_t   val_bits;
>> +	uint8_t    version;
>> +	uint8_t    soc_id;
>> +	uint8_t    socket_id;
>> +	uint8_t    nimbus_id;
>> +	uint8_t    sub_module_id;
>> +	uint8_t    core_id;
>> +	uint8_t    port_id;
>> +	uint8_t    err_severity;
>> +	uint16_t   err_type;
>> +	uint8_t    reserv[2];
>> +	uint32_t   err_misc[HISI_PCIE_ERR_MISC_REGS];
>
>Use u64, u8, u32 throughout instead.

We will change it.

>
>> +};
>> +
>> +struct pcie_err_info {
>> +	struct hisi_pcie_local_err_data err_data;
>> +	struct platform_device *pdev;
>> +};
>> +
>> +static char *pcie_local_sub_module_name(uint8_t id) {
>> +	switch (id) {
>> +	case HISI_PCIE_SUB_MODULE_ID_AP: return "AP Layer";
>> +	case HISI_PCIE_SUB_MODULE_ID_TL: return "TL Layer";
>> +	case HISI_PCIE_SUB_MODULE_ID_MAC: return "MAC Layer";
>> +	case HISI_PCIE_SUB_MODULE_ID_DL: return "DL Layer";
>> +	case HISI_PCIE_SUB_MODULE_ID_SDI: return "SDI Layer";
>> +	}
>> +
>> +	return "unknown";
>> +}
>> +
>> +static char *err_severity(uint8_t err_sev) {
>> +	switch (err_sev) {
>> +	case HISI_ERR_SEV_RECOVERABLE: return "recoverable";
>> +	case HISI_ERR_SEV_FATAL: return "fatal";
>> +	case HISI_ERR_SEV_CORRECTED: return "corrected";
>> +	case HISI_ERR_SEV_NONE: return "none";
>> +	}
>> +
>> +	return "unknown";
>> +}
>> +
>> +static struct pci_dev *hisi_hip08_pcie_get_rp(u32 chip_id, u32
>> +port_id) {
>> +	u32 devfn = PCI_DEVFN(port_id, 0);
>> +	u32 busnr = HISI_PCIE_ROOT_BUSNR(chip_id);
>> +
>> +	return pci_get_domain_bus_and_slot(0, busnr, devfn); }
>> +
>> +static int hisi_hip08_pcie_port_acpi_reset(struct platform_device *pdev,
>> +					u32 chip_id, u32 port_id)
>> +{
>> +	struct device *dev = &(pdev->dev);
>
>Unnecessary parens.  More occurrences below.

We will change it.

>
>> +	struct acpi_object_list arg_list;
>> +	union acpi_object arg[3];
>> +
>> +	arg[0].type = ACPI_TYPE_INTEGER;
>> +	arg[0].integer.value = chip_id;
>> +	arg[1].type = ACPI_TYPE_INTEGER;
>> +	arg[1].integer.value = HISI_PCIE_CORE_ID(port_id);
>> +	arg[2].type = ACPI_TYPE_INTEGER;
>> +	arg[2].integer.value = HISI_PCIE_CORE_PORT_ID(port_id);
>> +
>> +	arg_list.count = 3;
>> +	arg_list.pointer = arg;
>> +	/* Call the ACPI handle to reset root port  */
>
>s/root port  /root port /
>
>> +	if (ACPI_HANDLE(dev)) {
>
>Restructure this to return early for error and unindent the following, e.g.,

Ok. We will change it.

>
>  acpi_handle handle = ACPI_HANDLE(dev);
>
>  if (!handle) {
>    dev_err(...);
>    return -EINVAL;
>  }
>
>  arg[0].type = ACPI_TYPE_INTEGER;
>  ...
>  s = acpi_evaluate_integer(handle, ...);
>
>> +		unsigned long long data = 0;
>> +		acpi_status s;
>> +
>> +		s = acpi_evaluate_integer(ACPI_HANDLE(dev),
>> +				"RST", &arg_list, &data);
>> +
>> +		if (ACPI_FAILURE(s)) {
>> +			dev_err(dev, "No Reset method\n");
>> +			return -EIO;
>> +		}
>> +
>> +		if (data) {
>> +			dev_err(dev, "Failed to Reset\n");
>> +			return -EIO;
>> +		}
>> +
>> +	} else {
>> +		dev_err(dev, "No Reset method\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int hisi_hip08_pcie_port_reset(struct platform_device *dev,
>> +				      u32 chip_id, u32 port_id)
>> +{
>> +	struct pci_dev *pdev;
>> +	struct pci_bus *root_bus;
>> +
>> +	pdev = hisi_hip08_pcie_get_rp(chip_id, port_id);
>> +	if (!pdev) {
>> +		dev_info(&(dev->dev), "Fail to get root port device\n");
>> +		return -ENODEV;
>> +	}
>> +	root_bus = pdev->bus;
>> +
>> +	pci_stop_and_remove_bus_device_locked(pdev);
>> +
>> +	if (hisi_hip08_pcie_port_acpi_reset(dev, chip_id, port_id))
>> +		return -EIO;
>> +	ssleep(1UL);
>
>Please include a comment that cites the spec section that requires this sleep.

we'll add a comment. We use a 1s delay here as does in pci_reset_secondary_bus().
It's necessary for re-initialization of subordinate devices.

>
>> +
>> +	/* add root port and downstream devices */
>> +	pci_lock_rescan_remove();
>> +	pci_rescan_bus(root_bus);
>> +	pci_unlock_rescan_remove();
>> +
>> +	return 0;
>> +}
>> +
>> +static void pcie_local_error_handle(const struct hisi_pcie_local_err_data
>*err,
>> +				    struct platform_device *pdev) {
>> +	char buf[HISI_PCIE_ERR_INFO_SIZE];
>> +	char *p = buf, *end = buf + sizeof(buf);
>> +	struct device *dev = &(pdev->dev);
>> +	uint32_t i;
>> +	int rc;
>> +
>> +	if (err->val_bits == 0) {
>> +		dev_warn(dev, "%s: no valid error information\n", __func__);
>> +		return;
>> +	}
>> +
>> +	/* Logging */
>> +	p += snprintf(p, end - p, "[ Table version=%d ", err->version);
>> +	if (err->val_bits & HISI_PCIE_LOCAL_VALID_SOC_ID)
>> +		p += snprintf(p, end - p, "SOC ID=%d ", err->soc_id);
>> +
>> +	if (err->val_bits & HISI_PCIE_LOCAL_VALID_SOCKET_ID)
>> +		p += snprintf(p, end - p, "socket ID=%d ", err->socket_id);
>> +
>> +	if (err->val_bits & HISI_PCIE_LOCAL_VALID_NIMBUS_ID)
>> +		p += snprintf(p, end - p, "nimbus ID=%d ", err->nimbus_id);
>> +
>> +	if (err->val_bits & HISI_PCIE_LOCAL_VALID_SUB_MODULE_ID)
>> +		p += snprintf(p, end - p, "sub module=%s ",
>> +			      pcie_local_sub_module_name(err-
>>sub_module_id));
>> +
>> +	if (err->val_bits & HISI_PCIE_LOCAL_VALID_CORE_ID)
>> +		p += snprintf(p, end - p, "core ID=core%d ", err->core_id);
>> +
>> +	if (err->val_bits & HISI_PCIE_LOCAL_VALID_PORT_ID)
>> +		p += snprintf(p, end - p, "port ID=port%d ", err->port_id);
>> +
>> +	if (err->val_bits & HISI_PCIE_LOCAL_VALID_ERR_SEVERITY)
>> +		p += snprintf(p, end - p, "error severity=%s ",
>> +			      err_severity(err->err_severity));
>> +
>> +	if (err->val_bits & HISI_PCIE_LOCAL_VALID_ERR_TYPE)
>> +		p += snprintf(p, end - p, "error type=0x%x ", err->err_type);
>> +
>> +	p += snprintf(p, end - p, "]\n");
>> +	dev_info(dev, "\nHISI HIP08: PCIe local error\n");
>> +	dev_info(dev, "%s\n", buf);
>> +
>> +	dev_info(dev, "Reg Dump:\n");
>> +	for (i = 0; i < HISI_PCIE_ERR_MISC_REGS; i++) {
>> +		if (err->val_bits & BIT(HISI_PCIE_LOCAL_VALID_ERR_MISC + i))
>> +			dev_info(dev,
>> +				 "ERR_MISC_%d=0x%x\n", i, err->err_misc[i]);
>> +	}
>> +
>> +	/* Recovery for the PCIe local errors */
>> +	if (err->err_severity == HISI_ERR_SEV_RECOVERABLE) {
>> +		/* try reset PCI port for the error recovery */
>> +		rc = hisi_hip08_pcie_port_reset(pdev, err->socket_id,
>> +				HISI_PCIE_PORT_ID(err->core_id, err-
>>port_id));
>> +		if (rc) {
>> +			dev_info(dev, "fail to do hip08 pcie port reset\n");
>> +			return;
>> +		}
>> +	}
>> +}
>> +
>> +static DEFINE_KFIFO(pcie_err_recover_ring, struct pcie_err_info,
>> +		    HISI_PCIE_ERR_RECOVER_RING_SIZE); static
>> +DEFINE_SPINLOCK(pcie_err_recover_ring_lock);
>> +
>> +static void pcie_local_err_recover_work_func(struct work_struct
>> +*work) {
>> +	struct pcie_err_info pcie_err_entry;
>> +
>> +	while (kfifo_get(&pcie_err_recover_ring, &pcie_err_entry)) {
>> +		pcie_local_error_handle(&pcie_err_entry.err_data,
>> +					pcie_err_entry.pdev);
>> +	}
>> +}
>> +
>> +static DECLARE_WORK(pcie_err_recover_work,
>> +pcie_local_err_recover_work_func);
>> +
>> +
>> +static void hip08_pcie_local_error_handle(struct acpi_hest_generic_data
>*gdata,
>> +					  int sev, void *data)
>> +{
>> +	const struct hisi_pcie_local_err_data *err_data =
>> +					acpi_hest_get_payload(gdata);
>> +	struct pcie_err_info err_info;
>> +	struct platform_device *pdev = data;
>> +	struct device *dev = &(pdev->dev);
>> +
>> +	memcpy(&err_info.err_data, err_data, sizeof(*err_data));
>> +	err_info.pdev = pdev;
>> +
>> +	if (kfifo_in_spinlocked(&pcie_err_recover_ring, &err_info, 1,
>> +				&pcie_err_recover_ring_lock))
>> +		schedule_work(&pcie_err_recover_work);
>> +	else
>> +		dev_warn(dev, "Buffer overflow when recovering PCIe local
>> +error\n");
>
>I'd call this a "queue full" warning or similar.  "Buffer overflow"
>suggests that we wrote past the end of a buffer and corrupted some memory,
>but that's not the case here.

We will change it to "queue full".

>
>> +}
>> +
>> +static int hisi_hip08_pcie_err_handler_probe(struct platform_device
>> +*pdev) {
>> +	int ret = 0;
>
>Pointless local variable; maybe you meant to return failure if
>ghes_register_event_handler() fails?  Don't initialize unless it's necessary.

We will fix it. We  need to return error value on ghes_register_event_handler  failure.

>
>> +
>> +	if (ghes_register_event_handler(hip08_pcie_sec_type,
>> +					hip08_pcie_local_error_handle,
>> +					pdev)) {
>> +		dev_err(&(pdev->dev), "%s : ghes_register_event_handler
>fail\n",
>> +			__func__);
>> +		return ret;
>> +}
>
>Indentation error.

Ok. we will correct it.

>
>> +
>> +	return 0;
>> +}
>> +
>> +static int hisi_hip08_pcie_err_handler_remove(struct platform_device
>> +*pdev) {
>> +	ghes_unregister_event_handler(hip08_pcie_sec_type);
>> +
>> +	return 0;
>> +}
>> +
>> +static const struct acpi_device_id hip08_pcie_acpi_match[] = {
>> +	{ "HISI0361", 0 },
>> +	{ }
>> +};
>> +
>> +static struct platform_driver hisi_hip08_pcie_err_handler_driver = {
>> +	.driver = {
>> +		.name	= "hisi-hip08-pcie-err-handler",
>> +		.acpi_match_table = hip08_pcie_acpi_match,
>> +	},
>> +	.probe		= hisi_hip08_pcie_err_handler_probe,
>> +	.remove		= hisi_hip08_pcie_err_handler_remove,
>> +};
>> +module_platform_driver(hisi_hip08_pcie_err_handler_driver);
>> +
>> +MODULE_DESCRIPTION("HiSilicon HIP08 PCIe controller error handling
>> +driver"); MODULE_LICENSE("GPL v2");
>> +
>> --
>> 1.9.1
>>
>>

Thanks,
Shiju



[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux