> On Mar 30, 2017, at 7:30 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > >> + spinlock_t sc_rw_ctxt_lock; >> + struct list_head sc_rw_ctxts; > > It's a little sad that we always need a list and a spinlock when > most requests should need a single context only. The current code needs resources protected by several spinlocks, some of which disable bottom-halfs. This rewrite takes it down to just this one plain vanilla spinlock which picks up all the svcrdma layer resources needed for the I/O at once. There are some common cases which can require more than one of these. My point is, I think this is better than trips to a memory allocator, because those frequently require at least one BH-disabled or irqsave spinlock, which helps prevent latency outliers and, rarely, allocation failures. That said, I will happily consider any solution that does not require critical sections! >> + * Each WR chain handles a single contiguous server-side buffer, >> + * because some registration modes (eg. FRWR) do not support a >> + * discontiguous scatterlist. > > Both FRWR and FMR have no problem with a discontiguous page list, > they only have a problem with any segment but the first not starting > page aligned. For NFS you'll need vectored direct I/O to hit that > case. I'll rewrite the comment. For the Write chunk path, each RDMA segment in the chunk can have a different R_key. So each non-empty segment gets its own rdma_rw chain. If the client is good, it will use a single large segment, but not all of them do. The Reply chunk case occurs commonly, and can require three or more separate scatterlists, due to the alignment constraint. Each RPC Reply resides in an xdr_buf, each of which has up to three portions: 1. A head, which is not necessarily page-aligned, 2. A page list, which does not have to be page-aligned, and 3. A tail, which is frequently but not always in the same page as the head (and is thus not expected to be page-aligned). The client can provide multiple segments, each with its own R_key. The server has to fit the RDMA Writes into both the alignment constraints of the xdr_buf components, and the segments provided by the client. This is why I organized the "write the reply chunk" path this way. Thanks to both you and Sagi for excellent review comments. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html