On Sun, Jul 08, 2018 at 02:31:58PM +0000, Vadim Pasternak wrote: > Extend the logic of ASIC health discovery. > ASIC device can indicate its health state as a good, booting or dormant. > During ASIC reset the device is dropped to dormant state and should get to > the stable good health state through the intermediate booting state. > The sequence for getting to the steady state health after reset is: > dormant -> booting -> [booting -> good -> dormant -> booting] -> good. > The sequence in brackets can be repeated several times. > Initial implementation assumes that the this sequence is always repeated > by the constant number. This patch allows to perform health steady state > discovery independently of the repetitions number. > > Signed-off-by: Vadim Pasternak <vadimp@xxxxxxxxxxxx> > --- > drivers/platform/mellanox/mlxreg-hotplug.c | 38 +++++++++++++++++++++++++++--- > 1 file changed, 35 insertions(+), 3 deletions(-) > > diff --git a/drivers/platform/mellanox/mlxreg-hotplug.c b/drivers/platform/mellanox/mlxreg-hotplug.c > index ac97aa0..f32637e 100644 > --- a/drivers/platform/mellanox/mlxreg-hotplug.c > +++ b/drivers/platform/mellanox/mlxreg-hotplug.c > @@ -51,8 +51,9 @@ > #define MLXREG_HOTPLUG_AGGR_MASK_OFF 1 > > /* ASIC health parameters. */ > +#define MLXREG_HOTPLUG_DOWN_MASK 0x00 > #define MLXREG_HOTPLUG_HEALTH_MASK 0x02 > -#define MLXREG_HOTPLUG_RST_CNTR 3 > +#define MLXREG_HOTPLUG_RST_CNTR 2 > > #define MLXREG_HOTPLUG_ATTRS_MAX 24 > #define MLXREG_HOTPLUG_NOT_ASSERT 3 > @@ -325,20 +326,51 @@ mlxreg_hotplug_health_work_helper(struct mlxreg_hotplug_priv_data *priv, > goto out; > > regval &= data->mask; > - item->cache = regval; > + /* > + * ASIC health indication is provided through two bits. Bits > + * value 0x2 indicates that ASIC reached the good health, value > + * 0x0 indicates ASIC the bad health or dormant state and value > + * 0x2 indicates the booting state. During ASIC reset it should A bit odd to have specific values in paragraph form in comments, generally I'd expect to see these as defines, but as this is a counter and we aren't using these specific values in the logic... OK - but perhaps something more obvious like an itemized list: 0: Dormant 1: Booting 2: Good Note that above the text states that 0x2 is both "good" and "booting", we'll need to at least correct that. Since it is a counter, and not a bitmask, the 0x prefix isn't needed in my opinion. > + * pass the following states: dormant -> booting -> good. > + * The transition from dormant to booting state and from > + * booting to good state are indicated by ASIC twice, so actual > + * sequence for getting to the steady state after reset is: > + * dormant -> booting -> booting -> good -> good. It is > + * possible that due to some hardware noise, the transition > + * sequence will look like: dormant -> booting -> [ booting -> > + * good -> dormant -> booting ] -> good -> good. > + */ What does it mean to go from good -> good ?? Is that an intention transition to the same state? It seems the above state transition diagram could be simplified considerably like: (dormant -> booting -> good)+ Where the + indicates this occurs 1 or more times. If this isn't accurate - what am I missing? > if (regval == MLXREG_HOTPLUG_HEALTH_MASK) { > - if ((data->health_cntr++ == MLXREG_HOTPLUG_RST_CNTR) || > + if ((++data->health_cntr == MLXREG_HOTPLUG_RST_CNTR) || This is fairly confusing to read, 2 indicates good (I think, see above), but we're comparing it to "RST_CNTR" - what does that mean? Using another define which is more explicit would help with legibility. Hrm... both HEALTH_MASK and RST_CNTR are defined as 2.... which of these is testing for the good state? > !priv->after_probe) { > + /* > + * ASIC is in steady state. Connect associated > + * device, if configured. > + */ > mlxreg_hotplug_device_create(priv, data); > data->attached = true; > } > } else { > if (data->attached) { > + /* > + * ASIC health is dropped after ASIC has been What does "health is dropped" mean? Is this a state transition? good to dormant? If so, using the same terminology would help legibility. > + * in steady state. Disconnect associated > + * device, if it has been connected. > + */ > mlxreg_hotplug_device_destroy(data); > data->attached = false; > data->health_cntr = 0; > + } else if (regval == MLXREG_HOTPLUG_DOWN_MASK && > + item->cache == MLXREG_HOTPLUG_HEALTH_MASK) { > + /* > + * Decrease counter, if health has been dropped > + * before ASIC reaches the steady state, like: > + * good -> dormant -> booting. > + */ > + data->health_cntr--; > } > } > + item->cache = regval; I found this to be pretty confusing. The description of the states above don't seem to map readily to the logic above. > > /* Acknowledge event. */ > ret = regmap_write(priv->regmap, data->reg + > -- > 2.1.4 > > -- Darren Hart VMware Open Source Technology Center