On Fri, Jan 09, 2015 at 02:53:55AM +0000, tthayer@xxxxxxxxxxxxxxxxxxxxx wrote: > From: Thor Thayer <tthayer@xxxxxxxxxxxxxxxxxxxxx> > > Adding L2 Cache and On-Chip RAM EDAC support for the > Altera SoCs using the EDAC device model. The SDRAM > controller is using the Memory Controller model. > > Each type of ECC is individually configurable. > > The SDRAM ECC is a separate Kconfig option because: > 1) the SDRAM preparation can take almost 2 seconds on boot and some > customers need a faster boot time. > 2) the SDRAM has an ECC initialization dependency on the preloader > which is outside the kernel. It is desirable to be able to turn the > SDRAM on & off separately. > > Signed-off-by: Thor Thayer <tthayer@xxxxxxxxxxxxxxxxxxxxx> > --- > v2: Fix L2 dependency comments. > > v3: Move OCRAM and L2 cache EDAC functions into altera_edac.c > instead of separate files. > > v4: Change mask defines to use BIT(). > Fix comment style to agree with kernel coding style. > Better printk description for read != write in trigger. > Remove SysFS debugging message. > Better dci->mod_name > Move gen_pool pointer assignment to end of function. > Invert logic to reduce indent in ocram depenency check. > Change from dev_err() to edac_printk() > Replace magic numbers with defines & comments. > Improve error injection test. > Change Makefile intermediary name to altr (from alt) > > v5: No change. > > v6: Convert to nested EDAC in device tree. Force L2 cache > on for L2Cache ECC & remove L2 cache syscon for checking > enable bit. Update year in header. > --- > drivers/edac/Kconfig | 16 ++ > drivers/edac/Makefile | 5 +- > drivers/edac/altera_edac.c | 506 +++++++++++++++++++++++++++++++++++++++++++- > 3 files changed, 524 insertions(+), 3 deletions(-) [...] > +/************************* EDAC Parent Probe *************************/ > + > +static const struct of_device_id altr_edac_device_of_match[]; Huh? What's this for? > +static const struct of_device_id altr_edac_of_match[] = { > + { .compatible = "altr,edac" }, > + {}, > +}; > +MODULE_DEVICE_TABLE(of, altr_edac_of_match); I know it may seem like a minor thing, but the documentation really should have come in an earlier patch. It's painful to review a patch series when you have to randomly just to the end of hte series to see the documentation. The name is _very_ generic here. Do we not have a more specific name for the EDAC block in your SoC? Is there actually a specific EDAC device, or are you just grouping some portions of HW blocks into an EDAC device to match what the Linux EDAC framework wants? [...] > +ssize_t altr_edac_device_trig(struct edac_device_ctl_info *edac_dci, > + const char *buffer, size_t count) > +{ > + u32 *ptemp, i, error_mask; > + int result = 0; > + unsigned long flags; > + struct altr_edac_device_dev *drvdata = edac_dci->pvt_info; > + const struct edac_device_prv_data *priv = drvdata->data; > + void *generic_ptr = edac_dci->dev; Huh? What's hidden behind this "generic_ptr"? > + > + if (!priv->alloc_mem) > + return -ENOMEM; > + > + /* > + * Note that generic_ptr is initialized to the device * but in > + * some alloc_functions, this is overridden and returns data. > + */ > + ptemp = priv->alloc_mem(priv->trig_alloc_sz, &generic_ptr); > + if (!ptemp) { > + edac_printk(KERN_ERR, EDAC_DEVICE, > + "Inject: Buffer Allocation error\n"); > + return -ENOMEM; > + } > + > + if (buffer[0] == ALTR_UE_TRIGGER_CHAR) > + error_mask = priv->ue_set_mask; > + else > + error_mask = priv->ce_set_mask; > + > + edac_printk(KERN_ALERT, EDAC_DEVICE, > + "Trigger Error Mask (0x%X)\n", error_mask); > + > + local_irq_save(flags); > + /* write ECC corrupted data out. */ > + for (i = 0; i < (priv->trig_alloc_sz / sizeof(*ptemp)); i++) { > + /* Read data so we're in the correct state */ > + rmb(); > + if (ACCESS_ONCE(ptemp[i])) > + result = -1; Perhaps s/result/err/? You could even have an err_count, which would give you slightly more useful output. > + /* Toggle Error bit (it is latched), leave ECC enabled */ > + writel(error_mask, drvdata->base); > + writel(priv->ecc_enable_mask, drvdata->base); > + ptemp[i] = i; > + } > + /* Ensure it has been written out */ > + wmb(); > + local_irq_restore(flags); > + > + if (result) > + edac_printk(KERN_ERR, EDAC_DEVICE, "Mem Not Cleared\n"); > + > + /* Read out written data. ECC error caused here */ > + for (i = 0; i < ALTR_TRIGGER_READ_WRD_CNT; i++) > + if (ACCESS_ONCE(ptemp[i]) != i) > + edac_printk(KERN_ERR, EDAC_DEVICE, > + "Read doesn't match written data\n"); > + > + if (priv->free_mem) > + priv->free_mem(ptemp, priv->trig_alloc_sz, generic_ptr); > + > + return count; > +} [...] > +static void *ocram_alloc_mem(size_t size, void **other) > +{ > + struct device_node *np; > + struct gen_pool *gp; > + void *sram_addr; > + > + np = of_find_compatible_node(NULL, NULL, "altr,ocram-edac"); > + if (!np) > + return NULL; > + > + gp = of_get_named_gen_pool(np, "iram", 0); > + if (!gp) { > + of_node_put(np); > + return NULL; > + } > + of_node_put(np); > + > + sram_addr = (void *)gen_pool_alloc(gp, size / sizeof(size_t)); I'm still very much confused by this sizeof(size_t) division, but hopefully your response to my earlier query will answer that. [...] > +static void *l2_alloc_mem(size_t size, void **other) > +{ > + struct device *dev = *other; > + void *ptemp = devm_kzalloc(dev, size, GFP_KERNEL); > + > + if (!ptemp) > + return NULL; > + > + /* Make sure everything is written out */ > + wmb(); > + > + /* > + * Clean all cache levels up to LoC (includes L2) > + * This ensures the corrupted data is written into > + * L2 cache for readback test (which causes ECC error). > + */ > + flush_cache_all(); I'm a little confused by this comment (especially w.r.t. the L2 being before the LoC). Are we attempting to flush everything _to_ the L2, or beyond/out-of the L2? As far as I can see flush_cache_all will only achieve the former, and not the latter, as it doesn't seem to perform any outer cache operations. Per the architecture, Set/Way maintenance of this form won't always act as you expect (though I believe that for Cortex-A9 it does). > + > + return ptemp; > +} > + > +static void l2_free_mem(void *p, size_t size, void *other) > +{ > + struct device *dev = other; > + > + if (dev && p) > + devm_kfree(dev, p); Is this ever called in that case? Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html