RE: [PATCH v1 2/8] platform/mellanox: mlxreg-hotplug: Improve mechanism of ASIC health discovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Darren Hart [mailto:dvhart@xxxxxxxxxxxxx]
> Sent: Wednesday, July 25, 2018 8:28 PM
> To: Vadim Pasternak <vadimp@xxxxxxxxxxxx>
> Cc: andy.shevchenko@xxxxxxxxx; gregkh@xxxxxxxxxxxxxxxxxxx; platform-
> driver-x86@xxxxxxxxxxxxxxx; jiri@xxxxxxxxxxx; Michael Shych
> <michaelsh@xxxxxxxxxxxx>; ivecera@xxxxxxxxxx
> Subject: Re: [PATCH v1 2/8] platform/mellanox: mlxreg-hotplug: Improve
> mechanism of ASIC health discovery
> 
> On Sun, Jul 08, 2018 at 02:31:58PM +0000, Vadim Pasternak wrote:
> > Extend the logic of ASIC health discovery.
> > ASIC device can indicate its health state as a good, booting or dormant.
> > During ASIC reset the device is dropped to dormant state and should
> > get to the stable good health state through the intermediate booting state.
> > The sequence for getting to the steady state health after reset is:
> > dormant -> booting -> [booting -> good -> dormant -> booting] -> good.
> > The sequence in brackets can be repeated several times.
> > Initial implementation assumes that the this sequence is always
> > repeated by the constant number. This patch allows to perform health
> > steady state discovery independently of the repetitions number.
> >
> > Signed-off-by: Vadim Pasternak <vadimp@xxxxxxxxxxxx>
> > ---
> >  drivers/platform/mellanox/mlxreg-hotplug.c | 38
> > +++++++++++++++++++++++++++---
> >  1 file changed, 35 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/platform/mellanox/mlxreg-hotplug.c
> > b/drivers/platform/mellanox/mlxreg-hotplug.c
> > index ac97aa0..f32637e 100644
> > --- a/drivers/platform/mellanox/mlxreg-hotplug.c
> > +++ b/drivers/platform/mellanox/mlxreg-hotplug.c
> > @@ -51,8 +51,9 @@
> >  #define MLXREG_HOTPLUG_AGGR_MASK_OFF	1
> >
> >  /* ASIC health parameters. */
> > +#define MLXREG_HOTPLUG_DOWN_MASK	0x00
> >  #define MLXREG_HOTPLUG_HEALTH_MASK	0x02
> > -#define MLXREG_HOTPLUG_RST_CNTR		3
> > +#define MLXREG_HOTPLUG_RST_CNTR		2
> >
> >  #define MLXREG_HOTPLUG_ATTRS_MAX	24
> >  #define MLXREG_HOTPLUG_NOT_ASSERT	3
> > @@ -325,20 +326,51 @@ mlxreg_hotplug_health_work_helper(struct
> mlxreg_hotplug_priv_data *priv,
> >  			goto out;
> >
> >  		regval &= data->mask;
> > -		item->cache = regval;
> > +		/*
> > +		 * ASIC health indication is provided through two bits. Bits
> > +		 * value 0x2 indicates that ASIC reached the good health, value
> > +		 * 0x0 indicates ASIC the bad health or dormant state and value
> > +		 * 0x2 indicates the booting state. During ASIC reset it should
> 
> A bit odd to have specific values in paragraph form in comments, generally I'd
> expect to see these as defines, but as this is a counter and we aren't using these
> specific values in the logic... OK - but perhaps something more obvious like an
> itemized list:

Hi Darren,

Thank you for review.

My mistake in comment. States from HW are:
	b00: Dormant
	b01: N/A
	b10: Good
	b11: Booting

These are bitmask actually. I'll fix the comment.

> 
> 		  0: Dormant
> 		  1: Booting
> 		  2: Good
> 
> Note that above the text states that 0x2 is both "good" and "booting", we'll
> need to at least correct that. Since it is a counter, and not a bitmask, the 0x
> prefix isn't needed in my opinion.
> 
> > +		 * pass the following states: dormant -> booting -> good.
> > +		 * The transition from dormant to booting state and from
> > +		 * booting to good state are indicated by ASIC twice, so actual
> > +		 * sequence for getting to the steady state after reset is:
> > +		 * dormant -> booting -> booting -> good -> good. It is
> > +		 * possible that due to some hardware noise, the transition
> > +		 * sequence will look like: dormant -> booting -> [ booting ->
> > +		 * good -> dormant -> booting ] -> good -> good.
> > +		 */
> 
> What does it mean to go from good -> good ?? Is that an intention transition to
> the same state? It seems the above state transition diagram could be simplified
> considerably like:
> 
> (dormant -> booting -> good)+

OK. I'll fix it.

Regarding booting - booting - good - good transition.
I retested it again with scope and this is just some noise on line and I can ignore
same -> same transition.
I'll modify this patch and re-send.
It also means that I can drop MLXREG_HOTPLUG_RST_CNTR at all.

> 
> Where the + indicates this occurs 1 or more times. If this isn't accurate - what
> am I missing?
> 
> >  		if (regval == MLXREG_HOTPLUG_HEALTH_MASK) {
> > -			if ((data->health_cntr++ ==
> MLXREG_HOTPLUG_RST_CNTR) ||
> > +			if ((++data->health_cntr ==
> MLXREG_HOTPLUG_RST_CNTR) ||
> 
> This is fairly confusing to read, 2 indicates good (I think, see above), but we're
> comparing it to "RST_CNTR" - what does that mean? Using another define which
> is more explicit would help with legibility.
> 
> Hrm... both HEALTH_MASK and RST_CNTR are defined as 2.... which of these is
> testing for the good state?
> 
> >  			    !priv->after_probe) {
> > +				/*
> > +				 * ASIC is in steady state. Connect associated
> > +				 * device, if configured.
> > +				 */
> >  				mlxreg_hotplug_device_create(priv, data);
> >  				data->attached = true;
> >  			}
> >  		} else {
> >  			if (data->attached) {
> > +				/*
> > +				 * ASIC health is dropped after ASIC has been
> 
> What does "health is dropped" mean? Is this a state transition? good to
> dormant? If so, using the same terminology would help legibility.
> 
> > +				 * in steady state. Disconnect associated
> > +				 * device, if it has been connected.
> > +				 */
> >  				mlxreg_hotplug_device_destroy(data);
> >  				data->attached = false;
> >  				data->health_cntr = 0;
> > +			} else if (regval == MLXREG_HOTPLUG_DOWN_MASK
> &&
> > +				   item->cache ==
> MLXREG_HOTPLUG_HEALTH_MASK) {
> > +				/*
> > +				 * Decrease counter, if health has been
> dropped
> > +				 * before ASIC reaches the steady state, like:
> > +				 * good -> dormant -> booting.
> > +				 */
> > +				data->health_cntr--;
> >  			}
> >  		}
> > +		item->cache = regval;
> 
> I found this to be pretty confusing. The description of the states above don't
> seem to map readily to the logic above.
> 
> >
> >  		/* Acknowledge event. */
> >  		ret = regmap_write(priv->regmap, data->reg +
> > --
> > 2.1.4
> >
> >
> 
> --
> Darren Hart
> VMware Open Source Technology Center




[Index of Archives]     [Linux Kernel Development]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux