Re: Congestion window or other reason?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sep 26, 2008, at 5:24 PM, Talpey, Thomas wrote:

Ok, you've got my attention! Is the code visible somewhere btw?

No, it is in our internal CVS. I can send you a tarball if you want to take a look.

Interesting. The client does not have a global view, unfortunately,
and has no idea how busy the server is (i.e. how many other clients it
is servicing).

Correct, because the NFS protocol is not designed this way. However,
the server can manage clients via the RPCRDMA credit mechanism, by
allowing them to send more or less messages in response to its own
load.

I believe that I am duplicating the RPCRDMA usage of credits. I need to check.

RPCRDMA credits are primarily used for this, it's not so much the fact that there's a queuepair, it's actually the number of posted receives. If the client sends more than the server has available, then the connection will fail. However, the server can implement something called "shared receive
queue" which permits a sort of oversubscription.

MX's behavior is more like the shared receive queue. Unexpected messages <=32KB are stored in a temp buffer until the matching receive has been posted. Once it is posted, the data is copied to the receive buffers and the app can complete the request by testing (polling) or waiting (blocking).

MX also gives an app the ability to supply a function to handle unexpected messages. Instead of per-posting receives like RPCRDMA, I allocate the ctxt and hang them on an idle queue (doubly-linked list). In the unexpected handler, I dequeue a ctxt and post the matching receive. MX then can place the data in the proper buffer without an additional copy.

I chose not to pre-post the receives for the client's request messages since they could overwhelm the MX posted receive list. By using the unexpected handler, only bulk IO are pre-posted (i.e. after the request has come in).

Yes, and dedicating that much memory to clients is another. With the
IB and iWARP protocols and the current Linux server, these buffers are
not shared. This enhances integrity and protection, but it limits the
maximum scaling. I take it this is not a concern for you?

I am not sure about what you mean by integrity and protection. A buffer is only used by one request at a time.

RPC is purely a request/response mechanism, with rules for discovering
endpoints and formatting requests and replies. RPCRDMA adds framing
for RDMA networks, and mechanisms for managing RDMA networks such
as credits and rules on when to use RDMA. Finally, the NFS/RDMA transport binding makes requirements for sending messages. Since there are several
NFS protocol versions, the answer to your question depends on that.
There is no congestion control (slow start, message sizes) in the RPC
protocol, however there are many implementations of it in RPC.

I am trying to duplicate all of the above from RPCRDMA. I am curious why a client read of 256 pages with a rsize of 128 pages arrives in three transfers of 32, 128, and then 96 pages. I assume that the same reason is allowing client writes to succeed only if the max pages is 32.

I'm not certain if your question is purely about TCP, or if it's about RDMA
with TCP as an example. However in both cases the answer is the same:
it's not about the size of a message, it's about the message itself. If
the client and server have agreed that a 1MB write is ok, then yes the
client may immediately send 1MB.

Tom.

Hmmm, I will try to debug the svc_process code to find the oops.

I am on vacation next week. I will take a look once I get back.

Thanks!

Scott
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux