On 2024/9/23 17:02, Leon Romanovsky wrote: > On Mon, Sep 23, 2024 at 02:17:40PM +0800, Junxian Huang wrote: >> >> >> On 2024/9/20 20:47, Jason Gunthorpe wrote: >>> On Fri, Sep 20, 2024 at 05:18:14PM +0800, Junxian Huang wrote: >>> >>>>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c >>>>>> index 4cb0af733587..49315f39361d 100644 >>>>>> --- a/drivers/infiniband/hw/hns/hns_roce_main.c >>>>>> +++ b/drivers/infiniband/hw/hns/hns_roce_main.c >>>>>> @@ -466,6 +466,11 @@ static int hns_roce_mmap(struct ib_ucontext *uctx, struct vm_area_struct *vma) >>>>>> pgprot_t prot; >>>>>> int ret; >>>>>> >>>>>> + if (hr_dev->dis_db) { >>>>> >>>>> How do you clear dis_db after calling to hns_roce_hw_v2_reset_notify_down()? Does it have any locking protection? >>>>> >>>> >>>> Sorry for the late response, I just came back from vacation. >>>> >>>> After calling hns_roce_hw_v2_reset_notify_down(), we will call ib_unregister_device() >>>> and destory all HW resources eventually, so there is no need to clear dis_db. >>> >>> Why can't you do the unregister device sooner then and avoid all this >>> special stuff? >>> >> >> It's a limitation of HW. Resources such as QP/CQ/MR will be destoryed >> during unregistering device. This is not allowed by HW until >> hns_roce_hw_v2_reset_notify_uninit(), or it may lead to some HW errors. > > It is interested claim given the fact that you are changing original > code from 2016. > Well, this isn't a new issue. We once sent a patch to try to address it in 2019 [1], but that solution wasn't the right way since it relied on userspace. We haven't come up with any new solutions since then, until this series recently. [1] https://lore.kernel.org/linux-rdma/20190812055220.GA8440@xxxxxxxxxxxxxxxxxx/ Junxian > Thanks > >> >>> I assumed you'd bring the same device back after completing the reset?? >>> >> >> Yes >> >>> Jason >>