On Fri, Sep 30, 2022 at 02:26:56AM +0300, Serge Semin wrote: > The generic DW uMCTL2 DDRC v3.x support was added in commit f7824ded4149 > ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR"). It > hasn't been done quiet well there with respect to the IRQs handling > procedure. An attempt to fix that was introduced in the recent commit > 4bcffe941758 ("EDAC/synopsys: Re-enable the error interrupts on v3 hw"). > Alas again it didn't provide quite complete solution. Because? Btw, for the future, you should make sure you add those commit authors to Cc so that they can get a chance to comment. Adding the folks from that commit to Cc. > First of all the commit f7824ded4149 ("EDAC/synopsys: Add support for > version 3 of the Synopsys EDAC DDR") log says that v3.80a "has UE/CE auto > cleared". They aren't in none of the IP-core versions. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ What does that sentence mean exactly? The UE/CE auto clearing functionality is not in that silicon? > The IRQ status can be cleared by means of setting the ECCCLR/ECCCTL > register self-cleared flags 0-3. I'm guessing that's this reg: /* ECC control register */ #define ECC_CTRL_OFST 0xC4 ? > The pending IRQ clearance is done in the respective get_error_info() > method of the driver. Thus defining a quirk flag with the > "DDR_ECC_INTR_SELF_CLEAR" name was at least very inaccurate if not to > say misleading. > > So was adding the comments about the "ce/ue bits automatically > cleared". Aah, you mean that the ->get_error_info() functions are doing the clearing even if something should be doing self clearing. And that DDR_ECC_INTR_SELF_CLEAR thing is queried when enabling the error interrupt which is just bad naming because it looks like that quirk controls what register to write/read. > Second, disabling the being handled IRQ in the handler doesn't make sense > in Linux since the IC line is masked during that procedure anyway. So > disabling the IRQ in one part of the handler and enabling it at the end of > the method is simply redundant. (See, the ZynqMP-specific code with the > QoS IRQ CSR didn't do that originally.) So what is this commit message of 4bcffe941758 ("EDAC/synopsys: Re-enable the error interrupts on v3 hw") then talking about: "Then the interrupt handler will be called only once." How is that interrupt supposed to be reenabled? > Finally calling the zynqmp_get_error_info() method concurrently with the > enable_irq()/disable_irq() functions causes the IRQs mask state race > condition. Starting from DW uMCTL2 DDRC IP-core v3.10a [1] the ECCCLR > register has been renamed to ECCCTL and has been equipped with CE/UE IRQs > enable/disable flags [2]. Aha, that sounds like the right thing to toggle. > So the CSR now serves for the IRQ status and control functions used > concurrently during the IRQ handling and the IRQ disabling/enabling. > Thus the corresponding critical section must be protected with the > IRQ-safe spin-lock. > So let's fix all the problems noted above. First the > DDR_ECC_INTR_SELF_CLEAR flag is renamed to SYNPS_ZYNQMP_IRQ_REGS. "Describe your changes in imperative mood, e.g. "make xyzzy do frotz" instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy to do frotz", as if you are giving orders to the codebase to change its behaviour." In this case, pls formulate it something like this: "So fix all these problems noted above: rename DDR_ECC_INTR_SELF_CLEAR to SYNPS_ZYNQMP_IRQ_REGS to denote that, ..." And so on. > Its semantic is now the opposite: the quirk means having the ZynqMP > IRQ CSRs available on the platform. Yes, that makes more sense. > Second the DDR_UE_MASK and DDR_CE_MASK macros > are renamed to imply being used in the framework of the ECCCLR/ECCCTL CSRs > accesses. Third all the misleading comments are removed. Finally the > ECC_CLR_OFST register IOs are now protected with the IRQ-safe spin-lock > taken in order to prevent the IRQ status clearance and IRQ enable/disable > race condition. > > [1] DesignWare Cores Enhanced Universal DDR Memory and Protocol > Controllers (uMCTL2/uPCTL2), Release Notes, Version 3.91a, October 2020, > p. 27. > [2] DesignWare® Cores Enhanced Universal DDR Memory Controller (uMCTL2), > Databook Version 3.91a, October 2020, p.818-819. If those are not publicly accessible, then there's no point to put them in here. > Fixes: f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR") > Signed-off-by: Serge Semin <Sergey.Semin@xxxxxxxxxxxxxxxxxxxx> Does this need to go to stable@ and thus older kernels? > --- > drivers/edac/synopsys_edac.c | 76 +++++++++++++++++++++++------------- > 1 file changed, 48 insertions(+), 28 deletions(-) > @@ -300,6 +299,7 @@ struct synps_ecc_status { > /** > * struct synps_edac_priv - DDR memory controller private instance data. > * @baseaddr: Base address of the DDR controller. > + * @lock: Concurrent CSRs access lock. > * @message: Buffer for framing the event specific info. > * @stat: ECC status information. > * @p_data: Platform data. > @@ -314,6 +314,7 @@ struct synps_ecc_status { > */ > struct synps_edac_priv { > void __iomem *baseaddr; > + spinlock_t lock; That lock needs to be named properly and have a comment above it what it protects. > char message[SYNPS_EDAC_MSG_SIZE]; > struct synps_ecc_status stat; > const struct synps_platform_data *p_data; ... > @@ -516,24 +523,42 @@ static void handle_error(struct mem_ctl_info *mci, struct synps_ecc_status *p) > > static void enable_intr(struct synps_edac_priv *priv) > { > + unsigned long flags; > + > /* Enable UE/CE Interrupts */ > - if (priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR) > - writel(DDR_UE_MASK | DDR_CE_MASK, > - priv->baseaddr + ECC_CLR_OFST); > - else > + if (priv->p_data->quirks & SYNPS_ZYNQMP_IRQ_REGS) { > writel(DDR_QOSUE_MASK | DDR_QOSCE_MASK, > priv->baseaddr + DDR_QOS_IRQ_EN_OFST); > > + return; > + } > + > + /* IRQs Enable/Disable feature has been available since v3.10a */ How does this comment help here? If it is available since a version number, why doesn't the below check a version number? IOW, what is the relevance of that comment here? In any case, I need to hear from this driver's maintainer too. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette