> -----Original Message----- > From: Darren Hart [mailto:dvhart@xxxxxxxxxxxxx] > Sent: Wednesday, July 25, 2018 8:28 PM > To: Vadim Pasternak <vadimp@xxxxxxxxxxxx> > Cc: andy.shevchenko@xxxxxxxxx; gregkh@xxxxxxxxxxxxxxxxxxx; platform- > driver-x86@xxxxxxxxxxxxxxx; jiri@xxxxxxxxxxx; Michael Shych > <michaelsh@xxxxxxxxxxxx>; ivecera@xxxxxxxxxx > Subject: Re: [PATCH v1 2/8] platform/mellanox: mlxreg-hotplug: Improve > mechanism of ASIC health discovery > > On Sun, Jul 08, 2018 at 02:31:58PM +0000, Vadim Pasternak wrote: > > Extend the logic of ASIC health discovery. > > ASIC device can indicate its health state as a good, booting or dormant. > > During ASIC reset the device is dropped to dormant state and should > > get to the stable good health state through the intermediate booting state. > > The sequence for getting to the steady state health after reset is: > > dormant -> booting -> [booting -> good -> dormant -> booting] -> good. > > The sequence in brackets can be repeated several times. > > Initial implementation assumes that the this sequence is always > > repeated by the constant number. This patch allows to perform health > > steady state discovery independently of the repetitions number. > > > > Signed-off-by: Vadim Pasternak <vadimp@xxxxxxxxxxxx> > > --- > > drivers/platform/mellanox/mlxreg-hotplug.c | 38 > > +++++++++++++++++++++++++++--- > > 1 file changed, 35 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/platform/mellanox/mlxreg-hotplug.c > > b/drivers/platform/mellanox/mlxreg-hotplug.c > > index ac97aa0..f32637e 100644 > > --- a/drivers/platform/mellanox/mlxreg-hotplug.c > > +++ b/drivers/platform/mellanox/mlxreg-hotplug.c > > @@ -51,8 +51,9 @@ > > #define MLXREG_HOTPLUG_AGGR_MASK_OFF 1 > > > > /* ASIC health parameters. */ > > +#define MLXREG_HOTPLUG_DOWN_MASK 0x00 > > #define MLXREG_HOTPLUG_HEALTH_MASK 0x02 > > -#define MLXREG_HOTPLUG_RST_CNTR 3 > > +#define MLXREG_HOTPLUG_RST_CNTR 2 > > > > #define MLXREG_HOTPLUG_ATTRS_MAX 24 > > #define MLXREG_HOTPLUG_NOT_ASSERT 3 > > @@ -325,20 +326,51 @@ mlxreg_hotplug_health_work_helper(struct > mlxreg_hotplug_priv_data *priv, > > goto out; > > > > regval &= data->mask; > > - item->cache = regval; > > + /* > > + * ASIC health indication is provided through two bits. Bits > > + * value 0x2 indicates that ASIC reached the good health, value > > + * 0x0 indicates ASIC the bad health or dormant state and value > > + * 0x2 indicates the booting state. During ASIC reset it should > > A bit odd to have specific values in paragraph form in comments, generally I'd > expect to see these as defines, but as this is a counter and we aren't using these > specific values in the logic... OK - but perhaps something more obvious like an > itemized list: Hi Darren, Thank you for review. My mistake in comment. States from HW are: b00: Dormant b01: N/A b10: Good b11: Booting These are bitmask actually. I'll fix the comment. > > 0: Dormant > 1: Booting > 2: Good > > Note that above the text states that 0x2 is both "good" and "booting", we'll > need to at least correct that. Since it is a counter, and not a bitmask, the 0x > prefix isn't needed in my opinion. > > > + * pass the following states: dormant -> booting -> good. > > + * The transition from dormant to booting state and from > > + * booting to good state are indicated by ASIC twice, so actual > > + * sequence for getting to the steady state after reset is: > > + * dormant -> booting -> booting -> good -> good. It is > > + * possible that due to some hardware noise, the transition > > + * sequence will look like: dormant -> booting -> [ booting -> > > + * good -> dormant -> booting ] -> good -> good. > > + */ > > What does it mean to go from good -> good ?? Is that an intention transition to > the same state? It seems the above state transition diagram could be simplified > considerably like: > > (dormant -> booting -> good)+ OK. I'll fix it. Regarding booting - booting - good - good transition. I retested it again with scope and this is just some noise on line and I can ignore same -> same transition. I'll modify this patch and re-send. It also means that I can drop MLXREG_HOTPLUG_RST_CNTR at all. > > Where the + indicates this occurs 1 or more times. If this isn't accurate - what > am I missing? > > > if (regval == MLXREG_HOTPLUG_HEALTH_MASK) { > > - if ((data->health_cntr++ == > MLXREG_HOTPLUG_RST_CNTR) || > > + if ((++data->health_cntr == > MLXREG_HOTPLUG_RST_CNTR) || > > This is fairly confusing to read, 2 indicates good (I think, see above), but we're > comparing it to "RST_CNTR" - what does that mean? Using another define which > is more explicit would help with legibility. > > Hrm... both HEALTH_MASK and RST_CNTR are defined as 2.... which of these is > testing for the good state? > > > !priv->after_probe) { > > + /* > > + * ASIC is in steady state. Connect associated > > + * device, if configured. > > + */ > > mlxreg_hotplug_device_create(priv, data); > > data->attached = true; > > } > > } else { > > if (data->attached) { > > + /* > > + * ASIC health is dropped after ASIC has been > > What does "health is dropped" mean? Is this a state transition? good to > dormant? If so, using the same terminology would help legibility. > > > + * in steady state. Disconnect associated > > + * device, if it has been connected. > > + */ > > mlxreg_hotplug_device_destroy(data); > > data->attached = false; > > data->health_cntr = 0; > > + } else if (regval == MLXREG_HOTPLUG_DOWN_MASK > && > > + item->cache == > MLXREG_HOTPLUG_HEALTH_MASK) { > > + /* > > + * Decrease counter, if health has been > dropped > > + * before ASIC reaches the steady state, like: > > + * good -> dormant -> booting. > > + */ > > + data->health_cntr--; > > } > > } > > + item->cache = regval; > > I found this to be pretty confusing. The description of the states above don't > seem to map readily to the logic above. > > > > > /* Acknowledge event. */ > > ret = regmap_write(priv->regmap, data->reg + > > -- > > 2.1.4 > > > > > > -- > Darren Hart > VMware Open Source Technology Center