On Thu, Mar 23, 2023 at 06:14:10PM +0000, Veerasenareddy Burru wrote: > > > > -----Original Message----- > > From: Leon Romanovsky <leon@xxxxxxxxxx> > > Sent: Thursday, March 23, 2023 3:47 AM > > To: Veerasenareddy Burru <vburru@xxxxxxxxxxx> > > Cc: netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Abhijit Ayarekar > > <aayarekar@xxxxxxxxxxx>; Sathesh B Edara <sedara@xxxxxxxxxxx>; > > Satananda Burla <sburla@xxxxxxxxxxx>; linux-doc@xxxxxxxxxxxxxxx; David S. > > Miller <davem@xxxxxxxxxxxxx>; Eric Dumazet <edumazet@xxxxxxxxxx>; > > Jakub Kicinski <kuba@xxxxxxxxxx>; Paolo Abeni <pabeni@xxxxxxxxxx> > > Subject: [EXT] Re: [PATCH net-next v4 8/8] octeon_ep: add heartbeat > > monitor > > > > External Email > > > > ---------------------------------------------------------------------- > > On Wed, Mar 22, 2023 at 02:19:57AM -0700, Veerasenareddy Burru wrote: > > > Monitor periodic heartbeat messages from device firmware. > > > Presence of heartbeat indicates the device is active and running. > > > If the heartbeat is missed for configured interval indicates firmware > > > has crashed and device is unusable; in this case, PF driver stops and > > > uninitialize the device. > > > > > > Signed-off-by: Veerasenareddy Burru <vburru@xxxxxxxxxxx> > > > Signed-off-by: Abhijit Ayarekar <aayarekar@xxxxxxxxxxx> > > > --- > > > v3 -> v4: > > > * 0007-xxx.patch in v3 is 0008-xxx.patch in v4. > > > > > > v2 -> v3: > > > * 0009-xxx.patch in v2 is now 0007-xxx.patch in v3 due to > > > 0007 and 0008.patch from v2 are removed in v3. > > > > > > v1 -> v2: > > > * no change <...> > > > + struct octep_device *oct = container_of(work, struct octep_device, > > > + hb_task.work); > > > + > > > + int miss_cnt; > > > + > > > + atomic_inc(&oct->hb_miss_cnt); > > > + miss_cnt = atomic_read(&oct->hb_miss_cnt); > > > > miss_cnt = atomic_inc_return(&oct->hb_miss_cnt); > > > > Thanks for the feedback. Will fix it. > > > > + if (miss_cnt < oct->conf->max_hb_miss_cnt) { > > > > How is this heartbeat working? You increment on every entry to > > octep_hb_timeout_task(), After max_hb_miss_cnt invocations, you will stop > > your device. > > > > Thanks > > > > Yes, device will be stopped after max_hb_miss_cnt heartbeats are missed. If I read code correctly, device will stop after octep_hb_timeout_task() calls which happens every msecs_to_jiffies(oct->conf->hb_interval * 1000. You don't cancel/resechdule job if timeout doesn't happen. Thanks > > > > + queue_delayed_work(octep_wq, &oct->hb_task, > > > + msecs_to_jiffies(oct->conf->hb_interval * > > 1000));