On Sat, May 26, 2018 at 09:47:43AM +0800, Wei Hu (Xavier) wrote: > > > On 2018/5/25 22:55, Jason Gunthorpe wrote: > > On Fri, May 25, 2018 at 01:54:31PM +0800, Wei Hu (Xavier) wrote: > >> > >> On 2018/5/25 5:31, Jason Gunthorpe wrote: > >>>> static const struct hnae3_client_ops hns_roce_hw_v2_ops = { > >>>> .init_instance = hns_roce_hw_v2_init_instance, > >>>> .uninit_instance = hns_roce_hw_v2_uninit_instance, > >>>> + .reset_notify = hns_roce_hw_v2_reset_notify, > >>>> }; > >>>> > >>>> static struct hnae3_client hns_roce_hw_v2_client = { > >>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c > >>>> index 1b79a38..ac51372 100644 > >>>> +++ b/drivers/infiniband/hw/hns/hns_roce_main.c > >>>> @@ -332,6 +332,9 @@ static struct ib_ucontext *hns_roce_alloc_ucontext(struct ib_device *ib_dev, > >>>> struct hns_roce_ib_alloc_ucontext_resp resp = {}; > >>>> struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev); > >>>> > >>>> + if (!hr_dev->active) > >>>> + return ERR_PTR(-EAGAIN); > >>> This still doesn't make sense, ib_unregister_device already makes sure > >>> that hns_roce_alloc_ucontext isn't running and won't be called before > >>> returning, don't need another flag to do that. > >>> > >>> Since this is the only place the active flag is tested it can just be deleted > >>> entirely. > >> Hi, Jason > >> > >> roce reset process: > >> 1. hr_dev->active = false; //make sure no any process call > >> ibv_open_device. > >> 2. call ib_dispatch_event() function to report IB_EVENT_DEVICE_FATAL > >> event. > >> 3. msleep(100); // for some app to free resources > >> 4. call ib_unregister_device(). > >> 5. ... > >> 6. ... > >> > >> There are 2 steps as above before calling ib_unregister_device(), we > >> evaluate > >> hr_dev->active with false to avoid no any process call > >> ibv_open_device. > > If you think this is the right flow then it is core issue to block new > > opens, not an individual driver issue, send a core patch - eg add a > > 'ib_driver_fatal_error()' call that does the dispatch and call it from > > all the drivers using this IB_EVENT_DEVICE_FATAL > Hi, Jason > > It seem to be no difference between calling ib_driver_fatal_error and > calling ib_dispatch_event directly in manufacturer driver. > > void ib_driver_fatal_error(struct ib_device *ib_dev, u8 port_num) > { > struct ib_event event; > > event.event = IB_EVENT_DEVICE_FATAL; > event.device = ib_dev; > event.element.port_num = port_num; > ib_dispatch_event(&event); > } My point was the core code should block calling hns_roce_alloc_ucontext after DEVICE_FATAL if we agree that is correct, the driver shouldn't be doing it. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html