On Tue, Nov 08, 2016 at 04:05:54PM -0500, Chuck Lever wrote: > > > On Nov 8, 2016, at 3:20 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > > > On Tue, Nov 08, 2016 at 03:13:13PM -0500, Chuck Lever wrote: > >> > >>> On Nov 8, 2016, at 3:03 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > >>> > >>> On Sat, Oct 29, 2016 at 12:21:03PM -0400, Chuck Lever wrote: > >>>> Hi Bruce- > >>> > >>> Sorry for the slow response! > >>> > >>> ... > >>>> In commit 39a9beab5acb83176e8b9a4f0778749a09341f1f ('rpc: share one xps between > >>>> all backchannels') you added: > >>>> > >>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c > >>>> index f5572e3..4f01f63 100644 > >>>> --- a/net/sunrpc/svc_xprt.c > >>>> +++ b/net/sunrpc/svc_xprt.c > >>>> @@ -136,6 +136,8 @@ static void svc_xprt_free(struct kref *kref) > >>>> /* See comment on corresponding get in xs_setup_bc_tcp(): */ > >>>> if (xprt->xpt_bc_xprt) > >>>> xprt_put(xprt->xpt_bc_xprt); > >>>> + if (xprt->xpt_bc_xps) > >>>> + xprt_switch_put(xprt->xpt_bc_xps); > >>>> xprt->xpt_ops->xpo_free(xprt); > >>>> module_put(owner); > >>>> } > >>>> > >>>> svc_xprt_free() is invoked by svc_xprt_put(). svc_xprt_put() is called > >>>> from svc_rdma in soft IRQ context (eg. svc_rdma_wc_receive). > >>> > >>> Is that necessary? I wonder why the svcrdma code seems to be taking so > >>> many of its own references on svc_xprts. > >> > >> The idea is to keep the xprt around while Work Requests (I/O) are running, > >> so that the xprt is guaranteed to be there during work completions. The > >> completion handlers (where svc_xprt_put is often invoked) run in soft IRQ > >> context. > >> > >> It's simple to change completions to use a Work Queue instead, but testing > >> so far shows that will result in a performance loss. I'm still studying it. > >> > >> Is there another way to keep the xprt's ref count boosted while I/O is > >> going on? > > > > Why do you need the svc_xprt in the completion? > > 1. The svc_xprt contains the svcrdma_xprt, which contains the Send Queue > accounting mechanism. SQ accounting has to be updated for each completion: > the completion indicates that the SQ entry used by this Work Request is > now available, and that other callers waiting for enough SQEs can go ahead. > > 2. When handling a Receive completion, the incoming RPC message is enqueued > on the svc_xprt via svc_xprt_enqueue, unless RDMA Reads are needed. > > 3. When handling a Read completion, the incoming RPC message is enqueued on > the svc_xprt via svc_xprt_enqueue. > > > > Can the xpo_detach method wait for any pending completions? > > So the completions would call wake_up on a waitqueue, or use a kernel > completion? That sounds more expensive than managing an atomic reference > count. > > I suppose some other reference count could be used to trigger a work item > that would invoke xpo_detach. > > But based on your comments, then, svc_xprt_put() was never intended to be > BH-safe. I'm not sure what was intended. It doesn't look to me like any other callers require it. --b. > > > > --b. > > > >> > >> > >>>> However, xprt_switch_put() takes a spin lock (xps_lock) which is locked > >>>> everywhere without disabling BHs. > >>>> > >>>> It looks to me like 39a9beab5acb makes svc_xprt_put() no longer BH-safe? > >>>> Not sure if svc_xprt_put() was intended to be BH-safe beforehand. > >>>> > >>>> Maybe xprt_switch_put() could be invoked in ->xpo_free, but that seems > >>>> like a temporary solution. > >>> > >>> Since xpo_free is also called from svc_xprt_put that doesn't sound like > >>> it would change anything. Or do we not trunk over RDMA for some reason? > >> > >> It's quite desirable to trunk NFS/RDMA on multiple connections, and it > >> should work just like it does for TCP, but so far it's not been tested. > > > -- > Chuck Lever > > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html