Hi Boris, > -----Original Message----- > From: Borislav Petkov <bp@xxxxxxxxx> > Sent: Friday, November 25, 2022 8:42 PM > To: Potthuri, Sai Krishna <sai.krishna.potthuri@xxxxxxx> > Cc: Rob Herring <robh+dt@xxxxxxxxxx>; Krzysztof Kozlowski > <krzysztof.kozlowski+dt@xxxxxxxxxx>; Michal Simek > <michal.simek@xxxxxxxxxx>; Mauro Carvalho Chehab > <mchehab@xxxxxxxxxx>; Tony Luck <tony.luck@xxxxxxxxx>; James Morse > <james.morse@xxxxxxx>; Robert Richter <rric@xxxxxxxxxx>; > devicetree@xxxxxxxxxxxxxxx; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; linux- > kernel@xxxxxxxxxxxxxxx; linux-edac@xxxxxxxxxxxxxxx; > saikrishna12468@xxxxxxxxx; git (AMD-Xilinx) <git@xxxxxxx>; Datta, > Shubhrajyoti <shubhrajyoti.datta@xxxxxxx>; kernel test robot > <lkp@xxxxxxxxx> > Subject: Re: [PATCH v6 2/2] EDAC/zynqmp: Add EDAC support for Xilinx > ZynqMP OCM > > On Wed, Nov 02, 2022 at 12:36:55PM +0530, Sai Krishna Potthuri wrote: > > Add EDAC support for Xilinx ZynqMP OCM Controller, > > So this: > > > this driver reports CE and UE errors upon interrupt generation, and > > also creates UE/CE debugfs entries for error injection when EDAC_DEBUG > > config is enabled. > > I can see in the patch itself. > > Do not talk about what your patch does - that should hopefully be visible > from the diff. Rather, talk about *why* you're doing what you're doing. > > Like, for example, you can explain here how this driver is going to co-exist > with the other EDAC driver, i.e., the Synopsys one or the DDRMC. Ok, will update. > > > Co-developed-by: Shubhrajyoti Datta <shubhrajyoti.datta@xxxxxxx> > > Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xxxxxxx> > > Signed-off-by: Sai Krishna Potthuri <sai.krishna.potthuri@xxxxxxx> > > Reported-by: kernel test robot <lkp@xxxxxxxxx> > > What exactly was reported by the robot? > > Pls put that in the commit message as > > "Fix issue <BLA> as reported by the robot." > > so that it is clear what that Reported-by: refers to. Ok, will update. > > > --- > > MAINTAINERS | 7 + > > drivers/edac/Kconfig | 9 + > > drivers/edac/Makefile | 1 + > > drivers/edac/zynqmp_ocm_edac.c | 485 > > +++++++++++++++++++++++++++++++++ > > 4 files changed, 502 insertions(+) > > create mode 100644 drivers/edac/zynqmp_ocm_edac.c > > > > diff --git a/MAINTAINERS b/MAINTAINERS index > > edc96cdb85e8..7a40caf536c2 100644 > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > @@ -21692,6 +21692,13 @@ F: > Documentation/devicetree/bindings/dma/xilinx/xlnx,zynqmp- > dpdma.yaml > > F: drivers/dma/xilinx/xilinx_dpdma.c > > F: include/dt-bindings/dma/xlnx-zynqmp-dpdma.h > > > > +XILINX ZYNQMP OCM EDAC DRIVER > > +M: Shubhrajyoti Datta <shubhrajyoti.datta@xxxxxxx> > > +M: Sai Krishna Potthuri <sai.krishna.potthuri@xxxxxxx> > > +S: Maintained > > +F: Documentation/devicetree/bindings/memory- > controllers/xlnx,zynqmp-ocmc-1.0.yaml > > +F: drivers/edac/zynqmp_ocm_edac.c > > + > > XILINX ZYNQMP PSGTR PHY DRIVER > > M: Anurag Kumar Vulisha <anurag.kumar.vulisha@xxxxxxxxxx> > > M: Laurent Pinchart <laurent.pinchart@xxxxxxxxxxxxxxxx> > > diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig index > > 58ab63642e72..bc30b7d02062 100644 > > --- a/drivers/edac/Kconfig > > +++ b/drivers/edac/Kconfig > > @@ -539,4 +539,13 @@ config EDAC_DMC520 > > Support for error detection and correction on the > > SoCs with ARM DMC-520 DRAM controller. > > > > +config EDAC_ZYNQMP_OCM > > + tristate "Xilinx ZynqMP OCM Controller" > > + depends on ARCH_ZYNQMP || COMPILE_TEST > > + help > > + This driver is targeted for Xilinx ZynqMP OCM (On Chip Memory) > > "This driver supports ...." Ok, will update. > > > + controller to support for error detection and correction. > > + This driver can also be built as a module. If so, the module > > + will be called zynqmp_ocm_edac. > > + > > endif # EDAC > > diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile index > > 2d1641a27a28..634c1cec1588 100644 > > --- a/drivers/edac/Makefile > > +++ b/drivers/edac/Makefile > > @@ -84,3 +84,4 @@ obj-$(CONFIG_EDAC_QCOM) += > qcom_edac.o > > obj-$(CONFIG_EDAC_ASPEED) += aspeed_edac.o > > obj-$(CONFIG_EDAC_BLUEFIELD) += bluefield_edac.o > > obj-$(CONFIG_EDAC_DMC520) += dmc520_edac.o > > +obj-$(CONFIG_EDAC_ZYNQMP_OCM) += > zynqmp_ocm_edac.o > > Is there going to be another ZynqMP EDAC driver? > > If not, this one could be simply called zynqmp_edac.c or? As we communicated earlier for ZynqMP platform we have both Synopsys (for DDRMC) and zynqmp_ocm_edac (for OCM) drivers. Just to be clear about what this driver is targeted for, we used ocm as part of file name. Ok, zynqmp_edac.c looks fine, will update. > > > diff --git a/drivers/edac/zynqmp_ocm_edac.c > > b/drivers/edac/zynqmp_ocm_edac.c new file mode 100644 index > > 000000000000..32f025b72380 > > --- /dev/null > > +++ b/drivers/edac/zynqmp_ocm_edac.c > > > +/* Interrupt masks */ > > +#define OCM_CEINTR_MASK BIT(6) > > +#define OCM_UEINTR_MASK BIT(7) > > +#define OCM_ECC_ENABLE_MASK BIT(0) > > +#define OCM_CEUE_MASK GENMASK(7, 6) > > Drop that one and use > > OCM_CEINTR_MASK | OCM_UEINTR_MASK > > everywhere. Ok, will update. > > > +#define OCM_FICOUNT_MASK GENMASK(23, 0) > > +#define OCM_NUM_UE_BITPOS 2 > > +#define OCM_BASEVAL 0xFFFC0000 > > +#define EDAC_DEVICE "ZynqMP-OCM" > > + > > +/** > > + * struct ecc_error_info - ECC error log information > > + * @addr: Fault generated at this address > > + * @data0: Generated fault data (lower 32-bit) > > + * @data1: Generated fault data (upper 32-bit) > > + */ > > +struct ecc_error_info { > > + u32 addr; > > + u32 data0; > > + u32 data1; > > What's wrong with calling those fault_lo and fault_hi then? Ok, will update. > > > +/** > > + * get_error_info - Get the current ECC error info > > + * @base: Pointer to the base address of the OCM memory controller > > + * @p: Pointer to the OCM ECC status structure > > + * @mask: Status register mask value > > + * > > + * Determines there is any ECC error or not > > + * > > + */ > > +static void get_error_info(void __iomem *base, struct ecc_status *p, > > +int mask) { > > + if (mask & OCM_CEINTR_MASK) { > > + p->ce_cnt++; > > + p->ceinfo.data0 = readl(base + CE_FFD0_OFST); > > + p->ceinfo.data1 = readl(base + CE_FFD1_OFST); > > + p->ceinfo.addr = (OCM_BASEVAL | readl(base + > CE_FFA_OFST)); > > + writel(ECC_CTRL_CLR_CE_ERR, base + OCM_ISR_OFST); > > + } else { > > I guess you need to check OCM_UEINTR_MASK here? Since we are dealing other interrupts in intr_handler(), this can be simple else. > > > + p->ue_cnt++; > > + p->ueinfo.data0 = readl(base + UE_FFD0_OFST); > > + p->ueinfo.data1 = readl(base + UE_FFD1_OFST); > > + p->ueinfo.addr = (OCM_BASEVAL | readl(base + > UE_FFA_OFST)); > > + writel(ECC_CTRL_CLR_UE_ERR, base + OCM_ISR_OFST); > > + } > > No, I didn't mean for you to drop that block. See comment in > intr_handler() below. Ok, will handle the warning in intr_handler() if it raises for other interrupts. > > > +/** > > + * handle_error - Handle controller error types CE and UE > > + * @dci: Pointer to the EDAC device controller instance > > + * @p: Pointer to the OCM ECC status structure > > + * > > + * Handles the controller ECC correctable and uncorrectable error. > > s/controller// - we know it is the controller. You probably should go through > all text in here and tone down all the "controller" mentions. Ok, will update. > > > + */ > > +static void handle_error(struct edac_device_ctl_info *dci, struct > > +ecc_status *p) { > > + struct edac_priv *priv = dci->pvt_info; > > + struct ecc_error_info *pinf; > > + > > + if (p->ce_cnt) { > > + pinf = &p->ceinfo; > > + snprintf(priv->message, ZYNQMP_OCM_EDAC_MSG_SIZE, > > + "\nOCM ECC error type :%s\nAddr: [0x%x]\nFault > Data[0x%08x%08x]", > > + "CE", pinf->addr, pinf->data1, pinf->data0); > > + edac_device_handle_ce(dci, 0, 0, priv->message); > > + } > > + > > + if (p->ue_cnt) { > > + pinf = &p->ueinfo; > > + snprintf(priv->message, ZYNQMP_OCM_EDAC_MSG_SIZE, > > + "\nOCM ECC error type :%s\nAddr: [0x%x]\nFault > Data[0x%08x%08x]", > > + "UE", pinf->addr, pinf->data1, pinf->data0); > > + edac_device_handle_ue(dci, 0, 0, priv->message); > > + } > > + > > + memset(p, 0, sizeof(*p)); > > +} > > + > > +/** > > + * intr_handler - ISR routine > > + * @irq: irq number > > + * @dev_id: device id pointer > > + * > > + * Return: IRQ_NONE, if interrupt not set or IRQ_HANDLED otherwise > > +*/ static irqreturn_t intr_handler(int irq, void *dev_id) { > > + struct edac_device_ctl_info *dci = dev_id; > > + struct edac_priv *priv = dci->pvt_info; > > + int regval; > > + > > + regval = readl(priv->baseaddr + OCM_ISR_OFST); > > + if (!(regval & OCM_CEUE_MASK)) > > + return IRQ_NONE; > > What is that check for? > > If neither of the masks are set but the device still raises an error interrupt, > then you need to WARN_ONCE() here so that people look at this and debug it > as to why it still raised an interrupt. Ok, will update. > > > + get_error_info(priv->baseaddr, &priv->stat, regval); > > + > > + priv->ce_cnt += priv->stat.ce_cnt; > > + priv->ue_cnt += priv->stat.ue_cnt; > > + handle_error(dci, &priv->stat); > > + > > + return IRQ_HANDLED; > > +} > > + > > +/** > > + * get_eccstate - Return the controller ECC status > > + * @base: Pointer to the OCM memory controller base address > > + * > > + * Get the ECC enable/disable status for the controller > > + * > > + * Return: ECC status 0/1. > > + */ > > +static bool get_eccstate(void __iomem *base) { > > + return readl(base + ECC_CTRL_OFST) & OCM_ECC_ENABLE_MASK; } > > + > > +#ifdef CONFIG_EDAC_DEBUG > > +/** > > + * inject_fault_count - write fault injection count > > + * @priv: Pointer to the EDAC private struct > > + * > > + * Update the fault injection count register, once the counter > > +reaches > > + * zero, it injects errors > > + */ > > +static void inject_fault_count(struct edac_priv *priv) { > > + u32 ficount = priv->fault_injection_cnt; > > + > > + ficount &= OCM_FICOUNT_MASK; > > + if (ficount != priv->fault_injection_cnt) > > Do this: > > if (ficount & ~OCM_FICOUNT_MASK) { > ficount &= OCM_FICOUNT_MASK; > edac_printk(KERN_INFO, EDAC_DEVICE, "Fault injection > count value truncated to: %d\n", ficount); > } > > i.e., mask it *only* when it is larger. Ok, will update. > > > + > > + writel(ficount, priv->baseaddr + OCM_FIC_OFST); } > > + > > +/** > > + * inject_cebitpos - Set CE bit position > > + * @priv: Pointer to the EDAC private struct > > + * > > + * Set any one bit to inject CE error */ static void > > +inject_cebitpos(struct edac_priv *priv) { > > + if (priv->ce_bitpos <= UE_MAX_BITPOS_LOWER) { > > + writel(BIT(priv->ce_bitpos), priv->baseaddr + > OCM_FID0_OFST); > > + writel(0, priv->baseaddr + OCM_FID1_OFST); > > + } else { > > + writel(BIT(priv->ce_bitpos - UE_MIN_BITPOS_UPPER), > > + priv->baseaddr + OCM_FID1_OFST); > > + writel(0, priv->baseaddr + OCM_FID0_OFST); > > I had to stare at this a bit to figure out what you're doing: the injection > registers are two 32-bit quantities and depending on where you inject, you > need to select into which offset to write it. > > But looking more at this, all the proper range checks should happen in the > debugfs callbacks, i.e., inject_ce_write() in this case. > > The actual injection function should only inject - that's it. > > And come to think of it, you don't need inject_cebitpos() or inject_uebitpos(). > > Your debugfs callbacks should: > > 1. check the range, error out and print a warning if range wrong 2. inject > otherwise. > > and that's it. Ok, will re-arrange the logic. > > ... > > > +static ssize_t inject_ue_write(struct file *file, const char __user *data, > > + size_t count, loff_t *ppos) { > > + struct edac_device_ctl_info *edac_dev = file->private_data; > > + struct edac_priv *priv = edac_dev->pvt_info; > > + char buf[6]; > > + u8 pos0, pos1, len; > > + int ret; > > The EDAC tree preferred ordering of variable declarations at the beginning of > a function is reverse fir tree order:: > > struct long_struct_name *descriptive_name; > unsigned long foo, bar; > unsigned int tmp; > int ret; > > The above is faster to parse than the reverse ordering:: > > int ret; > unsigned int tmp; > unsigned long foo, bar; > struct long_struct_name *descriptive_name; > > And even more so than random ordering:: > > unsigned long foo, bar; > int ret; > struct long_struct_name *descriptive_name; > unsigned int tmp; > > You're pretty much doing it but some functions' local vars still need re- > sorting. Ok, will update. > > > + > > + if (!data) > > + return -EFAULT; > > + > > + len = min_t(size_t, count, sizeof(buf)); > > + if (copy_from_user(buf, data, len)) > > + return -EFAULT; > > + > > + for (pos0 = 0; buf[pos0] != ' ' && pos0 <= len; pos0++) > > + ; > > + > > + if (pos0 > len) > > + return -EINVAL; > > + > > + ret = kstrtou8_from_user(data, pos0, 0, &priv->ue_bitpos[0]); > > + if (ret) > > + return ret; > > + > > + for (pos1 = pos0 + 1; buf[pos1] != '\n' && pos1 <= len; pos1++) > > + ; > > + > > + if (pos1 > count) > > + return -EINVAL; > > + > > + ret = kstrtou8_from_user(&data[pos0 + 1], pos1 - pos0 - 1, 0, > > + &priv->ue_bitpos[1]); > > This looks like it is composing multiple values. I guess the two bits for an UE > since UE is a two-bit error. > > No documentation? > > IOW, you need to document how this injection works. In a comment here > somewhere, pls explain what the user is supposed to do when she wants to > inject errors. > > Example: > > Documentation/firmware-guide/acpi/apei/einj.rst > > You don't have to do a separate file and go too much into detail but at least a > simple injection recipe/example would be prudent to have. Ok, will update API documentation like below. echo <fault_injection count> > /sys/kernel/debug/edac/ff960000.memory-controller/inject_fault_count echo <bit pos0> <bit pos1> > /sys/kernel/debug/edac/ff960000.memory-controller/inject_ue_bitpos > > > +static int edac_probe(struct platform_device *pdev) { > > + struct edac_priv *priv; > > + struct edac_device_ctl_info *dci; > > + void __iomem *baseaddr; > > + struct resource *res; > > + int irq, ret; > > + > > + baseaddr = devm_platform_get_and_ioremap_resource(pdev, 0, > &res); > > + if (IS_ERR(baseaddr)) > > + return PTR_ERR(baseaddr); > > + > > + if (!get_eccstate(baseaddr)) { > > + edac_printk(KERN_INFO, EDAC_DEVICE, > > + "ECC not enabled\n"); > > No need to break that line. Ok, will update. > > > + return -ENXIO; > > + } > > + > > + dci = edac_device_alloc_ctl_info(sizeof(*priv), > ZYNQMP_OCM_EDAC_STRING, > > + 1, ZYNQMP_OCM_EDAC_STRING, 1, > 0, NULL, 0, > > + edac_device_alloc_index()); > > + if (!dci) > > + return -ENOMEM; > > + > > + priv = dci->pvt_info; > > + platform_set_drvdata(pdev, dci); > > + dci->dev = &pdev->dev; > > + priv->baseaddr = baseaddr; > > + dci->mod_name = pdev->dev.driver->name; > > + dci->ctl_name = ZYNQMP_OCM_EDAC_STRING; > > + dci->dev_name = dev_name(&pdev->dev); > > + > > + irq = platform_get_irq(pdev, 0); > > + if (irq < 0) { > > + ret = irq; > > + goto free_dev_ctl; > > + } > > + > > + ret = devm_request_irq(&pdev->dev, irq, intr_handler, 0, > > + dev_name(&pdev->dev), dci); > > + if (ret) { > > + edac_printk(KERN_ERR, EDAC_DEVICE, "Failed to request > Irq\n"); > > + goto free_dev_ctl; > > + } > > + > > + /* Enable UE, CE interrupts */ > > + writel(OCM_CEUE_MASK, priv->baseaddr + OCM_IEN_OFST); > > + > > +#ifdef CONFIG_EDAC_DEBUG > > + setup_debugfs(dci); > > +#endif > > Do this instead: > > diff --git a/drivers/edac/zynqmp_ocm_edac.c > b/drivers/edac/zynqmp_ocm_edac.c index 32f025b72380..a2b8cf1eb986 > 100644 > --- a/drivers/edac/zynqmp_ocm_edac.c > +++ b/drivers/edac/zynqmp_ocm_edac.c > @@ -428,9 +428,8 @@ static int edac_probe(struct platform_device *pdev) > /* Enable UE, CE interrupts */ > writel(OCM_CEUE_MASK, priv->baseaddr + OCM_IEN_OFST); > > -#ifdef CONFIG_EDAC_DEBUG > - setup_debugfs(dci); > -#endif > + if (IS_ENABLED(CONFIG_EDAC_DEBUG)) > + setup_debugfs(dci); > > ret = edac_device_add_device(dci); > if (ret) > > below too. Ok, will update. > > > + > > + ret = edac_device_add_device(dci); > > + if (ret) > > + goto free_dev_ctl; > > + > > + return 0; > > + > > +free_dev_ctl: > > + edac_device_free_ctl_info(dci); > > + > > + return ret; > > +} > > + > > +static int edac_remove(struct platform_device *pdev) { > > + struct edac_device_ctl_info *dci = platform_get_drvdata(pdev); > > + struct edac_priv *priv = dci->pvt_info; > > + > > + /* Disable UE, CE interrupts */ > > + writel(OCM_CEUE_MASK, priv->baseaddr + OCM_IDS_OFST); > > + > > +#ifdef CONFIG_EDAC_DEBUG > > + debugfs_remove_recursive(priv->debugfs_dir); > > +#endif > > + > > + edac_device_del_device(&pdev->dev); > > + edac_device_free_ctl_info(dci); > > + > > + return 0; > > +} > > + > > +static const struct of_device_id zynqmp_ocm_edac_match[] = { > > + { .compatible = "xlnx,zynqmp-ocmc-1.0"}, > > + { /* end of table */ } > > +}; > > + > > +MODULE_DEVICE_TABLE(of, zynqmp_ocm_edac_match); > > + > > +static struct platform_driver zynqmp_ocm_edac_driver = { > > + .driver = { > > + .name = "zynqmp-ocm-edac", > > + .of_match_table = zynqmp_ocm_edac_match, > > + }, > > + .probe = edac_probe, > > + .remove = edac_remove, > > +}; > > + > > +module_platform_driver(zynqmp_ocm_edac_driver); > > + > > +MODULE_AUTHOR("Advanced Micro Devices, Inc"); > > +MODULE_DESCRIPTION("Xilinx ZynqMP OCM ECC driver"); > > +MODULE_LICENSE("GPL"); > > -- > > 2.25.1 > > > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette
<<attachment: winmail.dat>>