-----"Jason Gunthorpe" <jgg@xxxxxxxx> wrote: ----- >To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx> >From: "Jason Gunthorpe" <jgg@xxxxxxxx> >Date: 02/11/2019 07:03PM >Cc: linux-rdma@xxxxxxxxxxxxxxx >Subject: Re: [PATCH v4 03/13] SIW network and RDMA core interface > >On Mon, Feb 11, 2019 at 04:20:53PM +0000, Bernard Metzler wrote: >> >> >To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx> >> >From: "Jason Gunthorpe" <jgg@xxxxxxxx> >> >Date: 01/31/2019 06:22PM >> >Cc: linux-rdma@xxxxxxxxxxxxxxx >> >Subject: Re: [PATCH v4 03/13] SIW network and RDMA core interface >> > >> >On Thu, Jan 31, 2019 at 03:24:01PM +0000, Bernard Metzler wrote: >> > >> >> >> +static void siw_netdev_unregistered(struct work_struct >*work) >> >> >> +{ >> >> >> + struct siw_device *sdev = container_of(work, struct >> >siw_device, >> >> >> + netdev_unregister); >> >> >> + >> >> >> + struct siw_qp_attrs qp_attrs; >> >> >> + struct list_head *pos, *tmp; >> >> >> + >> >> >> + memset(&qp_attrs, 0, sizeof(qp_attrs)); >> >> >> + qp_attrs.state = SIW_QP_STATE_ERROR; >> >> >> + >> >> >> + /* >> >> >> + * Mark all current QP's of this device dead >> >> >> + */ >> >> >> + list_for_each_safe(pos, tmp, &sdev->qp_list) { >> >> >> + struct siw_qp *qp = list_entry(pos, struct siw_qp, devq); >> >> >> + >> >> >> + down_write(&qp->state_lock); >> >> >> + (void) siw_qp_modify(qp, &qp_attrs, SIW_QP_ATTR_STATE); >> >> >> + up_write(&qp->state_lock); >> >> >> + } >> >> > >> >> >Don't put code around ib_unregister. If you need to do things >use >> >a >> >> >callback at the right moment in the unregister flow. You might >> >need >> >> >to >> >> >add a callback (see my latest series) >> >> >> >> I thought by having that executed in a work queue context only >> >> we'd be safe. I have to bring down all related QP's if a network >> >device >> >> suddenly disappears... Looks like I have to better understand >the >> >> intended unregister flow. >> > >> >If it is important it should be done during module unload and link >> >removal, if it is not important, it shouldn't be done :) >> >> If siw catches a NETDEV_UNREGISTER event, it shall first >> close all related connection endpoints before the ib_device >> can go away. > >For things to work properly you can't do any steps outside the >unregister and its callbacks. > >NETDEV_UNREGISTER events should only call >ib_unregsiter_device_queued() and nothing more. > >> Following your new device management design, I would now >> first schedule local work to cleanup those siw private >> resources, and later ib_unregister_device_queued(), > >You should not schedule work. > >Drivers should be designed so that when all PD's are closed most of >their resources are gone as well. The core code will ensure all PD's >are closed. > The problem probably is, that PD's are not going away as long as the application sits on them. What I want is the provider to break affected connections, if a link goes down and let the application know about it. To exemplify this - if I have a netdev which wants to unregister and a connection using it - it cannot unregister since the rdma core does not free up affected resources as long as referenced (by the application). e.g., I see things like unregister_netdevice: waiting for enp1s0f4 to become free. Usage count = 1 What I just tried is to use the preceding NETDEV_GOING_DOWN event to mark my affected QP's dead. That seem to work well. >If there are internal driver PD/etcs then we should add a callback to >clean them up right after the clients are cleaned up. > >> or is it OK to do that local cleanup stuff within the >> dealloc_driver() method? > >dealloc_driver should free memory, and do any final fences against >async work. It shouldn't spawn new work. > >It is an appropriate place to free socket resources and other related >things. > >Jason > > Thanks Bernard.