Re: [PATCH v5 5/5] EDAC: Versal NET: Add support for error notification

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 27, 2025 at 11:32:10AM +0000, Datta, Shubhrajyoti wrote:
> > > +union edac_info {
> >
> > What is an "edac_info"?
> This is the row and column positions.
> We are using it to extract the position from the address decoder registers.

Needs a better name which is descriptive as to how it is used or what it
represents.

> > > +static void get_ddr_ue_error_info(u32 error_data[REGS_PER_CONTROLLER],
> > struct edac_priv *priv)
> >                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > What is that for?
> >
> The error_data contains the register values. Linux does not have access to the DDRMC register
> Space. It queries it from the NMC and gets the the data from the rpmsg callback.

I wasn't clear. Arrays in C are passed as pointers - the compiler does that
anyway.  You don't have to do this weird parameter specification.

> > > +     mci->edac_cap = EDAC_FLAG_SECDED;
> > > +     mci->ctl_name = "amd_ddr_controller";
> > > +     mci->dev_name = dev_name(&pdev->dev);
> > > +     mci->mod_name = "amd_edac";
> >
> > Do:
> >
> > git grep mod_name drivers/edac/
> >
> > to get an idea how those names are chosen.
> #define EDAC_MOD_STR    "r82600_edac"
> mci->mod_name = EDAC_MOD_STR;
> https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/edac/r82600_edac.c?h=v6.14-rc4#n316
> 
> https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/edac/i5000_edac.c?h=v6.14-rc4#n1424
> mci->mod_name = "i5000_edac.c";
> https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/edac/highbank_mc_edac.c?h=v6.14-rc4#n218
> mci->mod_name = pdev->dev.driver->name;
> 
> let me know if mci->mod_name = pdev->dev.driver->name; is fine.

I think you didn't get me again.

amd64_edac.c - the x86 driver is called this:

#define EDAC_MOD_STR "amd64_edac"

Calling yours "amd_edac" doesn't work.

"versalnet_edac"? That's probably better.

> > You don't need "inline" - the compiler can decide that itself. And
> > "process_bit" needs a better name.
> 
> Will rename it to populate_row_bit

Or "assign_row_bit" or whatever.

The function name should be describing what the function does as closely as
possible.

> > Why are those functions copying stuff around? Why aren't you using cols
> > directly?
> 
> The column bit position is used in converting to the physical address.
> We read it at init and use it every time an error occurs to get the address.
> Did you mean to remove the regval. Or read the error_data every time.

I mean simply use cols instead of assigning stuff to priv->col_bit* and then
using that.

> > Why is probing a work item?
> >
> > Explaining *that* is what a commit message is for - not for repeating useless
> > info.
> The RPMsg probe is invoked from a thread within the virtio driver responsible
> for processing the response ring. If the probe initiates an mcdi API call, it blocks
> until the mcdi response is received. However, since the mcdi response is also processed
> by the same thread that triggered the rpmsg probe, the thread remains blocked,
> preventing it from handling the response. This results in a deadlock.
> 
> To prevent it we have a work scheduled.

This is just insane.

I don't see anything in amd_setup_mcdi() that needs some response from some
mcdi thing. If not, you don't need the work queue thing.

> >
> > > +     for (i = 0; i < NUM_CONTROLLERS; i++) {
> > > +             config = priv->adec[CONF + i];
> > > +             num_chans = FIELD_GET(DDRMC5_NUM_CHANS_MASK, config);
> > > +             rank = FIELD_GET(DDRMC5_RANK_MASK, config);
> > > +             rank = 1 << rank;
> > > +             dwidth = FIELD_GET(DDRMC5_BUS_WIDTH_MASK, config);
> > > +             dt = get_dwidth(dwidth);
> > > +             if (dt != DEV_UNKNOWN)
> > > +                     break;
> > > +     }
> >
> > What is that loop supposed to do? Find the last controller before the one
> > with DEV_UNKNOWN device width and register that one?
> 
> There are 8 controllers all we try to get the first one that is enabled and register that one.
> We use the device unknown to know if that is enabled or not.

The first one that is enabled has unknown device width? What?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette




[Index of Archives]     [Device Tree Compilter]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux PCI Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Yosemite Backpacking]


  Powered by Linux