On Sat, Jan 12, 2019 at 08:21:38PM +0800, tanxiaofei wrote: > Hi Jason, > > On 2019/1/12 5:25, Jason Gunthorpe wrote: > > On Wed, Jan 09, 2019 at 09:35:46AM +0800, Xiaofei Tan wrote: > >> AEQ overflow will be reported by hardware when too many > >> asynchronous events occurred but not be handled in time. > >> Normally, AEQ overflow error is not easy to occur. Once > >> happened, we have to do physical function reset to recover. > >> PF reset is implemented in two steps. Firstly, set reset > >> level with ae_dev->ops->set_default_reset_request. > >> Secondly, run reset with ae_dev->ops->reset_event. > >> > >> Signed-off-by: Xiaofei Tan <tanxiaofei@xxxxxxxxxx> > >> Signed-off-by: Yixian Liu <liuyixian@xxxxxxxxxx> > >> drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 11 +++++++++++ > >> 1 file changed, 11 insertions(+) > > > > Why should this be a -rc patch? It doesn't look like it meets the > > requires for a stable kernel, or fixing something introduced in the > > merge window. > > > > This looks like a new feature to me. > > I think we could take this as a bug. Because the device can't > continue working, if we don't handle the AEQ overflow error. And the > code is simple and doesn't bring any unstable factor. Then i take > it as -rc patch. recovery from hardware failures is a new feature. If this is fixing a bug in the existing recovery, or is a major problem that many users are hitting, then explain in the commit comment. Jason