On Wed, 2011-04-13 at 13:56 -0400, Andy Adamson wrote: > On Apr 13, 2011, at 1:20 PM, Jeff Layton wrote: > > > On Wed, 13 Apr 2011 10:22:13 -0400 > > Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> wrote: > > > >> On Wed, 2011-04-13 at 10:02 -0400, Jeff Layton wrote: > >>> We could put the rpc_rqst's into a slabcache, and give each rpc_xprt a > >>> mempool with a minimum number of slots. Have them all be allocated with > >>> GFP_NOWAIT. If it gets a NULL pointer back, then the task can sleep on > >>> the waitqueue like it does today. Then, the clients can allocate > >>> rpc_rqst's as they need as long as memory holds out for it. > >>> > >>> We have the reserve_xprt stuff to handle congestion control anyway so I > >>> don't really see the value in the artificial limits that the slot table > >>> provides. > >>> > >>> Maybe I should hack up a patchset for this... > >> > >> This issue has come up several times recently. My preference would be to > >> tie the availability of slots to the TCP window size, and basically say > >> that if the SOCK_ASYNC_NOSPACE flag is set on the socket, then we hold > >> off allocating more slots until we get a ->write_space() callback which > >> clears that flag. > >> > >> For the RDMA case, we can continue to use the current system of a fixed > >> number of preallocated slots. > >> > > > > I take it then that we'd want a similar scheme for UDP as well? I guess > > I'm just not sure what the slot table is supposed to be for. > > [andros] I look at the rpc_slot table as a representation of the amount of data the connection to the server > can handle - basically the #slots should = double the bandwidth-delay product divided by the max(rsize/wsize). > For TCP, this is the window size. (ping of max MTU packet * interface bandwidth). > There is no reason to allocate more rpc_rqsts that can fit on the wire. Agreed, but as I said earlier, there is no reason to even try to use UDP on high bandwidth links, so I suggest we just leave it as-is. > > Possibly naive question, and maybe you or Andy have scoped this out > > already... > > > > Wouldn't it make more sense to allow the code to allocate rpc_rqst's as > > needed, and manage congestion control in reserve_xprt ? > > [andros] Congestion control is not what the rpc_slot table is managing. It does need to have > a minimum which experience has set at 16. It's the maximum that needs to be dynamic. > Congestion control by the lower layers should work unfettered within the # of rpc_slots. Today that > is not always the case when 16 slots is not enough to fill the wire, and the administrator has > not changed the # of rpc_slots. Agreed. However we do need to ensure is that the networking layer is aware that we have more data to send, and that it should negotiate a window size increase with the server if possible. To do that, we need to allocate just enough slots to hit the 'SOCK_ASYNC_NOSPACE' limit and then wait. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html