Re: [EXT] Re: [PATCH net-next v4 8/8] octeon_ep: add heartbeat monitor

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 23, 2023 at 06:14:10PM +0000, Veerasenareddy Burru wrote:
> 
> 
> > -----Original Message-----
> > From: Leon Romanovsky <leon@xxxxxxxxxx>
> > Sent: Thursday, March 23, 2023 3:47 AM
> > To: Veerasenareddy Burru <vburru@xxxxxxxxxxx>
> > Cc: netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Abhijit Ayarekar
> > <aayarekar@xxxxxxxxxxx>; Sathesh B Edara <sedara@xxxxxxxxxxx>;
> > Satananda Burla <sburla@xxxxxxxxxxx>; linux-doc@xxxxxxxxxxxxxxx; David S.
> > Miller <davem@xxxxxxxxxxxxx>; Eric Dumazet <edumazet@xxxxxxxxxx>;
> > Jakub Kicinski <kuba@xxxxxxxxxx>; Paolo Abeni <pabeni@xxxxxxxxxx>
> > Subject: [EXT] Re: [PATCH net-next v4 8/8] octeon_ep: add heartbeat
> > monitor
> > 
> > External Email
> > 
> > ----------------------------------------------------------------------
> > On Wed, Mar 22, 2023 at 02:19:57AM -0700, Veerasenareddy Burru wrote:
> > > Monitor periodic heartbeat messages from device firmware.
> > > Presence of heartbeat indicates the device is active and running.
> > > If the heartbeat is missed for configured interval indicates firmware
> > > has crashed and device is unusable; in this case, PF driver stops and
> > > uninitialize the device.
> > >
> > > Signed-off-by: Veerasenareddy Burru <vburru@xxxxxxxxxxx>
> > > Signed-off-by: Abhijit Ayarekar <aayarekar@xxxxxxxxxxx>
> > > ---
> > > v3 -> v4:
> > >  * 0007-xxx.patch in v3 is 0008-xxx.patch in v4.
> > >
> > > v2 -> v3:
> > >  * 0009-xxx.patch in v2 is now 0007-xxx.patch in v3 due to
> > >    0007 and 0008.patch from v2 are removed in v3.
> > >
> > > v1 -> v2:
> > >  * no change

<...>

> > > +	struct octep_device *oct = container_of(work, struct octep_device,
> > > +						hb_task.work);
> > > +
> > > +	int miss_cnt;
> > > +
> > > +	atomic_inc(&oct->hb_miss_cnt);
> > > +	miss_cnt = atomic_read(&oct->hb_miss_cnt);
> > 
> > miss_cnt = atomic_inc_return(&oct->hb_miss_cnt);
> > 
> 
> Thanks for the feedback. Will fix it.
> 
> > > +	if (miss_cnt < oct->conf->max_hb_miss_cnt) {
> > 
> > How is this heartbeat working? You increment on every entry to
> > octep_hb_timeout_task(), After max_hb_miss_cnt invocations, you will stop
> > your device.
> > 
> > Thanks
> > 
> 
> Yes, device will be stopped after max_hb_miss_cnt heartbeats are missed.

If I read code correctly, device will stop after octep_hb_timeout_task()
calls which happens every msecs_to_jiffies(oct->conf->hb_interval * 1000.
You don't cancel/resechdule job if timeout doesn't happen.

Thanks

> 
> > > +		queue_delayed_work(octep_wq, &oct->hb_task,
> > > +				   msecs_to_jiffies(oct->conf->hb_interval *
> > 1000));



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux