> -----Original Message----- > From: NeilBrown [mailto:neilb@xxxxxxx] > Sent: Thursday, September 25, 2014 2:32 AM > To: Strösser, Bodo > Cc: linux-nfs@xxxxxxxxxxxxxxx; bfields@xxxxxxxxxxxx > Subject: Re: rpc.mountd can be blocked by a bad client > > On Wed, 24 Sep 2014 12:57:09 +0200 "Strösser, Bodo" > <bodo.stroesser@xxxxxxxxxxxxxx> wrote: > > > Hello, > > > > a few days ago we had some trouble with a NFS server. The clients most of the time no > longer > > could mount any shares, but in rare cases they had success. > > > > We found out, that during the times when mounts failed, rpc.mountd hung on a write() to > a TCP > > socket. netstat showed, that Send-Q was full and Recv-Q counted up slowly. After a long > time > > the write ended with an error ("TCP timeout" IIRC) and rpc.mountd worked normally for a > short > > while until it again hung on write() for the same reason. The problem was caused by a > MTU size > > configured wrong. So, one single bad client (or as much clients as the number of threads > used > > by rpc.mountd) can block rpc.mountd entirely. > > > > But what will happen, if someone intentionally sends RPC requests, but doesn't read() > the > > answers? I wrote a small tool to test this situation. It fires DUMP requests to > rpc.mountd as > > fast as possible, but does not read from the socket. The result is the same as with the > > problem above: rpc.mountd hangs in write() and no longer responds to other requests > while no > > TCP timeout breaks up this situation. > > > > So it's quite easy to intentionally block rpc.mountd from remote. > > That's rather nasty. > We could possibly set the socket to be non-blocking, or we could set an alarm > just before handling a request. > Probably rpc_dispatch() in support/nfs/rpcdispatch.c would be the best place > to put the timeout. > catch SIGALRM (don't set SA_RESTART) > alarm(10); > call svc_sendreply > alarm(0); > I also thought about changing the socket to non-blocking. But I'm not sure: is it possible to have such big RPC replies, that they don't fit into the socket buffer? If so, write() would put the first part into the buffer and a second write for the rest would fail, as probably the first part isn't acked yet, right? So, non-blocking needs to be combined with a handling of buffer-full situations, I guess. Such a handling together with a timeout for starving connections would be a clean solution. To do that, one would have to replace the tcp write routine of the rpc library. That means to change the xdrs's pointer to the write function. I don't know, whether that can be done in a portable way, which works at the different platforms. About setting a alarm timeout: I'm not sure, that rpc_dispatch() is the right place for it. mountd uses mount_dispatch() which has an exit via svcerr_auth(), that again sends a reply. So the timeout you suggest should be inserted in mount_dispatch(), I think. OTOH, a timeout will shorten the hang, but bad clients can still slow down mountd extremely. BTW: AFAICS on Linux with libtirpc, using the control SVCGET_CONNMAXREC, the socket indirectly can set to non-blocking. That seems to result in write_vc() doing a max. 2 second loop of write() until it gives up. One other point: AFAICS on Linux with libtirpc the listening socket of mountd is in blocking mode. Would that be a problem when running multiple "threads"? The comment in svc_socket.c/svc_socket(), where the listening socket is set to non-blocking, sounds very reasonable. But AFAICS if libtirpc is used, O_NONBLOCK currently isn't set. Bodo Stroesser > if the alarm fires while svc_sendreply is writing to the socket it should get > an error and close the connection. > > This would only fix mountd (as it is the only process to use rpc_dispatch). > Is a similar thing needed for statd I wonder?? It isn't so important. > > NeilBrown > > > > > Please CC me, I'm not on the list. > > > > Best regards, > > Bodo > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥