Re: [PATCH v1 2/8] platform/mellanox: mlxreg-hotplug: Improve mechanism of ASIC health discovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jul 08, 2018 at 02:31:58PM +0000, Vadim Pasternak wrote:
> Extend the logic of ASIC health discovery.
> ASIC device can indicate its health state as a good, booting or dormant.
> During ASIC reset the device is dropped to dormant state and should get to
> the stable good health state through the intermediate booting state.
> The sequence for getting to the steady state health after reset is:
> dormant -> booting -> [booting -> good -> dormant -> booting] -> good.
> The sequence in brackets can be repeated several times.
> Initial implementation assumes that the this sequence is always repeated
> by the constant number. This patch allows to perform health steady state
> discovery independently of the repetitions number.
> 
> Signed-off-by: Vadim Pasternak <vadimp@xxxxxxxxxxxx>
> ---
>  drivers/platform/mellanox/mlxreg-hotplug.c | 38 +++++++++++++++++++++++++++---
>  1 file changed, 35 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/platform/mellanox/mlxreg-hotplug.c b/drivers/platform/mellanox/mlxreg-hotplug.c
> index ac97aa0..f32637e 100644
> --- a/drivers/platform/mellanox/mlxreg-hotplug.c
> +++ b/drivers/platform/mellanox/mlxreg-hotplug.c
> @@ -51,8 +51,9 @@
>  #define MLXREG_HOTPLUG_AGGR_MASK_OFF	1
>  
>  /* ASIC health parameters. */
> +#define MLXREG_HOTPLUG_DOWN_MASK	0x00
>  #define MLXREG_HOTPLUG_HEALTH_MASK	0x02
> -#define MLXREG_HOTPLUG_RST_CNTR		3
> +#define MLXREG_HOTPLUG_RST_CNTR		2
>  
>  #define MLXREG_HOTPLUG_ATTRS_MAX	24
>  #define MLXREG_HOTPLUG_NOT_ASSERT	3
> @@ -325,20 +326,51 @@ mlxreg_hotplug_health_work_helper(struct mlxreg_hotplug_priv_data *priv,
>  			goto out;
>  
>  		regval &= data->mask;
> -		item->cache = regval;
> +		/*
> +		 * ASIC health indication is provided through two bits. Bits
> +		 * value 0x2 indicates that ASIC reached the good health, value
> +		 * 0x0 indicates ASIC the bad health or dormant state and value
> +		 * 0x2 indicates the booting state. During ASIC reset it should

A bit odd to have specific values in paragraph form in comments,
generally I'd expect to see these as defines, but as this is a counter
and we aren't using these specific values in the logic... OK - but
perhaps something more obvious like an itemized list:

		  0: Dormant
		  1: Booting
		  2: Good

Note that above the text states that 0x2 is both "good" and "booting",
we'll need to at least correct that. Since it is a counter, and not a
bitmask, the 0x prefix isn't needed in my opinion.

> +		 * pass the following states: dormant -> booting -> good.
> +		 * The transition from dormant to booting state and from
> +		 * booting to good state are indicated by ASIC twice, so actual
> +		 * sequence for getting to the steady state after reset is:
> +		 * dormant -> booting -> booting -> good -> good. It is
> +		 * possible that due to some hardware noise, the transition
> +		 * sequence will look like: dormant -> booting -> [ booting ->
> +		 * good -> dormant -> booting ] -> good -> good.
> +		 */

What does it mean to go from good -> good ?? Is that an intention transition to
the same state? It seems the above state transition diagram could be simplified
considerably like:

(dormant -> booting -> good)+

Where the + indicates this occurs 1 or more times. If this isn't
accurate - what am I missing?

>  		if (regval == MLXREG_HOTPLUG_HEALTH_MASK) {
> -			if ((data->health_cntr++ == MLXREG_HOTPLUG_RST_CNTR) ||
> +			if ((++data->health_cntr == MLXREG_HOTPLUG_RST_CNTR) ||

This is fairly confusing to read, 2 indicates good (I think, see above),
but we're comparing it to "RST_CNTR" - what does that mean? Using
another define which is more explicit would help with legibility.

Hrm... both HEALTH_MASK and RST_CNTR are defined as 2.... which of these
is testing for the good state?

>  			    !priv->after_probe) {
> +				/*
> +				 * ASIC is in steady state. Connect associated
> +				 * device, if configured.
> +				 */
>  				mlxreg_hotplug_device_create(priv, data);
>  				data->attached = true;
>  			}
>  		} else {
>  			if (data->attached) {
> +				/*
> +				 * ASIC health is dropped after ASIC has been

What does "health is dropped" mean? Is this a state transition? good to
dormant? If so, using the same terminology would help legibility.

> +				 * in steady state. Disconnect associated
> +				 * device, if it has been connected.
> +				 */
>  				mlxreg_hotplug_device_destroy(data);
>  				data->attached = false;
>  				data->health_cntr = 0;
> +			} else if (regval == MLXREG_HOTPLUG_DOWN_MASK &&
> +				   item->cache == MLXREG_HOTPLUG_HEALTH_MASK) {
> +				/*
> +				 * Decrease counter, if health has been dropped
> +				 * before ASIC reaches the steady state, like:
> +				 * good -> dormant -> booting.
> +				 */
> +				data->health_cntr--;
>  			}
>  		}
> +		item->cache = regval;

I found this to be pretty confusing. The description of the states above
don't seem to map readily to the logic above.

>  
>  		/* Acknowledge event. */
>  		ret = regmap_write(priv->regmap, data->reg +
> -- 
> 2.1.4
> 
> 

-- 
Darren Hart
VMware Open Source Technology Center



[Index of Archives]     [Linux Kernel Development]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux