Re: [PATCH resend v3 rdma-rc 1/1] RDMA/hns: Add the process of AEQ overflow for hip08

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jason,

On 2019/1/19 3:58, Jason Gunthorpe wrote:
> On Wed, Jan 16, 2019 at 08:32:30PM +0800, tanxiaofei wrote:
>>
>> On 2019/1/16 6:03, Jason Gunthorpe wrote:
>>> On Sat, Jan 12, 2019 at 08:21:38PM +0800, tanxiaofei wrote:
>>>> Hi Jason,
>>>>
>>>> On 2019/1/12 5:25, Jason Gunthorpe wrote:
>>>>> On Wed, Jan 09, 2019 at 09:35:46AM +0800, Xiaofei Tan wrote:
>>>>>> AEQ overflow will be reported by hardware when too many
>>>>>> asynchronous events occurred but not be handled in time.
>>>>>> Normally, AEQ overflow error is not easy to occur. Once
>>>>>> happened, we have to do physical function reset to recover.
>>>>>> PF reset is implemented in two steps. Firstly, set reset
>>>>>> level with ae_dev->ops->set_default_reset_request.
>>>>>> Secondly, run reset with ae_dev->ops->reset_event.
>>>>>>
>>>>>> Signed-off-by: Xiaofei Tan <tanxiaofei@xxxxxxxxxx>
>>>>>> Signed-off-by: Yixian Liu <liuyixian@xxxxxxxxxx>
>>>>>>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 11 +++++++++++
>>>>>>  1 file changed, 11 insertions(+)
>>>>>
>>>>> Why should this be a -rc patch? It doesn't look like it meets the
>>>>> requires for a stable kernel, or fixing something introduced in the
>>>>> merge window.
>>>>>
>>>>> This looks like a new feature to me.
>>>>
>>>> I think we could take this as a bug. Because the device can't
>>>> continue working, if we don't handle the AEQ overflow error. And the
>>>> code is simple and doesn't bring any unstable factor.  Then i take
>>>> it as -rc patch.
>>>
>>> recovery from hardware failures is a new feature.
>>>
>>> If this is fixing a bug in the existing recovery, or is a major
>>> problem that many users are hitting, then explain in the commit
>>> comment.
>>>
>> Hi Jason,
>>
>> Thank you very much.
>> How about change the commit log as following?
>>
>> RDMA/hns: fix the issue of handling AEQ overflow for hip08
>>
>> AEQ overflow will be reported by hardware when too many
>> asynchronous events occurred but not be handled in time.
>>
>> Currently, we just do the print when AEQ overflow happened.
>> This is not right. We should also do physical function reset,
>> otherwise, the device can't continue working.
>>
>> PF reset is implemented in two steps. Firstly, set reset
>> level with ae_dev->ops->set_default_reset_request.
>> Secondly, run reset with ae_dev->ops->reset_event.
> 
> This still doesn't explain why this is for -rc.
> 
> New features are not for the -rc tree.
> 
> You'd need to explain how/why that this is a widespread a critical
> problem in your user base
> 

Maybe this patch is not suitable for the -rc branch.
Hmm, I will resend this patch based on -next. Many thanks.

> Jason
> 
> .
> 

-- 
 thanks
tanxiaofei




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux