Re: [PATCH resend v3 rdma-rc 1/1] RDMA/hns: Add the process of AEQ overflow for hip08

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jan 12, 2019 at 08:21:38PM +0800, tanxiaofei wrote:
> Hi Jason,
> 
> On 2019/1/12 5:25, Jason Gunthorpe wrote:
> > On Wed, Jan 09, 2019 at 09:35:46AM +0800, Xiaofei Tan wrote:
> >> AEQ overflow will be reported by hardware when too many
> >> asynchronous events occurred but not be handled in time.
> >> Normally, AEQ overflow error is not easy to occur. Once
> >> happened, we have to do physical function reset to recover.
> >> PF reset is implemented in two steps. Firstly, set reset
> >> level with ae_dev->ops->set_default_reset_request.
> >> Secondly, run reset with ae_dev->ops->reset_event.
> >>
> >> Signed-off-by: Xiaofei Tan <tanxiaofei@xxxxxxxxxx>
> >> Signed-off-by: Yixian Liu <liuyixian@xxxxxxxxxx>
> >>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 11 +++++++++++
> >>  1 file changed, 11 insertions(+)
> > 
> > Why should this be a -rc patch? It doesn't look like it meets the
> > requires for a stable kernel, or fixing something introduced in the
> > merge window.
> > 
> > This looks like a new feature to me.
>
> I think we could take this as a bug. Because the device can't
> continue working, if we don't handle the AEQ overflow error. And the
> code is simple and doesn't bring any unstable factor.  Then i take
> it as -rc patch.

recovery from hardware failures is a new feature.

If this is fixing a bug in the existing recovery, or is a major
problem that many users are hitting, then explain in the commit
comment.

Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux