On Thu, 2017-02-23 at 03:25 +0000, Trond Myklebust wrote: > On Wed, 2017-02-22 at 16:31 -0500, Chuck Lever wrote: > > Hey Trond- > > > > To support the ability to unload the underlying RDMA device's > > kernel > > driver while NFS mounts are active, xprtrdma needs the ability to > > suspend RPC sends temporarily while the transport hands HW > > resources > > back to the driver. Once the device driver is unloaded, the RDMA > > transport is left disconnected, and RPCs will be suspended normally > > until a connection is possible again (eg, a new device is made > > available). > > > > A DEVICE_REMOVAL event is an upcall to xprtrdma that may sleep. > > Upon > > its return, the device driver unloads itself. Currently my > > prototype > > frees all HW resources during the upcall, but that doesn't block > > new RPCs from trying to use those resources at the same time. > > > > Seems like the most natural way to temporarily block sends would be > > to grab the transport's write lock, just like "connect" does, while > > the transport is dealing with DEVICE_REMOVAL, then release it once > > all HW resources have been freed. > > > > Unfortunately an RPC task is needed to acquire the write lock. But > > disconnect is just an asynchronous event, there is no RPC task > > associated with it, and thus no context that the RPC scheduler > > can put to sleep if there happens to be another RPC sending at the > > moment a device removal event occurs. > > > > I was looking at xprt_lock_connect, but that doesn't appear to do > > quite what I need. > > > > Another thought was to have the DEVICE_REMOVAL upcall mark the > > transport disconnected, send an asynchronous NULL RPC, then wait > > on a kernel waitqueue. > > > > The NULL RPC would grab the write lock and kick the transport's > > connect worker. The connect worker would free HW resources, then > > awaken the waiter. Then the upcall could return to the driver. > > > > The problem with this scheme is the same as it was for the > > keepalive work: there's no task or rpc_clnt available to the > > DEVICE_REMOVAL upcall. Sleeping until the write lock is available > > would require a task, and sending a NULL RPC would require an > > rpc_clnt. > > > > Any advice/thoughts about this? > > > > Can you perhaps use XPRT_FORCE_DISCONNECT? That does end up calling Sorry. Dunno how that ended up all-caps. I did mean xprt_force_disconnect(). > the > xprt->ops->close() callback as soon as the XPRT_LOCK state has been > freed. You still won't have a client, but you will be guaranteed > exclusive access to the transport, and you can do things like waking > up > any sleeping tasks on the transmit and receive queue to help you. > However you also have to deal with the case where the transport was > idle to start with. > > The big problem that you have here is ultimately that the low level > control channel for the transport appears to want to use the RPC > upper > layer functionality for its communication mechanism. AFAICS you will > keep hitting issues as the control channel needs to circumvent all > the > queueing etc that these upper layers are designed to enforce. > Given that these messages you're sending are just null pings with no > payload and no special authentication needs or anything else, might > it > make sense to just generate them in the RDMA layer itself? > > -- > Trond Myklebust > Linux NFS client maintainer, PrimaryData > trond.myklebust@xxxxxxxxxxxxxxx > N�����r��y���b�X��ǧv�^�){.n�+����{���"��^n�r���z���h����&���G���h� > (�階�ݢj"���m�����z�ޖ���f���h���~�m� -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@xxxxxxxxxxxxxxx ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥