Re:rpc.mountd can be blocked by a bad client

bstroesser@xxxxxxxxxxxxxx · 09 Oct 2014 15:42:28 +0200

> -----Original Message-----
> From: StrÃ¶sser, Bodo
> Sent: Thursday, September 25, 2014 12:22 PM
> To: 'NeilBrown'
> Cc: linux-nfs@xxxxxxxxxxxxxxx; bfields@xxxxxxxxxxxx
> Subject: RE: rpc.mountd can be blocked by a bad client
> 
> > -----Original Message-----
> > From: NeilBrown [mailto:neilb@xxxxxxx]
> > Sent: Thursday, September 25, 2014 2:32 AM
> > To: StrÃ¶sser, Bodo
> > Cc: linux-nfs@xxxxxxxxxxxxxxx; bfields@xxxxxxxxxxxx
> > Subject: Re: rpc.mountd can be blocked by a bad client
> >
> > On Wed, 24 Sep 2014 12:57:09 +0200 "StrÃ¶sser, Bodo"
> > <bodo.stroesser@xxxxxxxxxxxxxx> wrote:
> >
> > > Hello,
> > >
> > > a few days ago we had some trouble with a NFS server. The clients
> > > most of the time no longer could mount any shares, but in rare
> > > cases they had success.
> > >
> > > We found out, that during the times when mounts failed, rpc.mountd
> > > hung on a write() to a TCP socket. netstat showed, that Send-Q was
> > > full and Recv-Q counted up slowly. After a long time the write
> > > ended with an error ("TCP timeout" IIRC) and rpc.mountd worked
> > > normally for a short while until it again hung on write() for the
> > > same reason.
> > > 
> > > The problem was caused by a MTU size configured wrong. So, one
> > > single bad client (or as much clients as the number of threads used
> > > by rpc.mountd) can block rpc.mountd entirely.
> > >
> > > But what will happen, if someone intentionally sends RPC requests,
> > > but doesn't read() the answers? I wrote a small tool to test this
> > > situation. It fires DUMP requests to rpc.mountd as fast as
> > > possible, but does not read from the socket. The result is the
> > > same as with the problem above: rpc.mountd hangs in write() and no
> > > longer responds to other requests while no TCP timeout breaks up
> > > this situation.
> > >
> > > So it's quite easy to intentionally block rpc.mountd from remote.
> >
> > That's rather nasty.
> > We could possibly set the socket to be non-blocking, or we could set an alarm
> > just before handling a request.
> > Probably rpc_dispatch() in support/nfs/rpcdispatch.c would be the best place
> > to put the timeout.
> >  catch SIGALRM (don't set SA_RESTART)
> >  alarm(10);
> >  call svc_sendreply
> >  alarm(0);
> >
> 
> I also thought about changing the socket to non-blocking. But I'm not sure: is it
> possible to have such big RPC replies, that they don't fit into the socket
> buffer? If so, write() would put the first part into the buffer and a second
> write for the rest would fail, as probably the first part isn't acked yet, right?
> So, non-blocking needs to be combined with a handling of buffer-full situations,
> I guess. Such a handling together with a timeout for starving connections would
> be a clean solution.
> To do that, one would have to replace the tcp write routine of the rpc library.
> That means to change the xdrs's pointer to the write function. I don't know,
> whether that can be done in a portable way, which works at the different platforms.
> 
> About setting a alarm timeout: I'm not sure, that rpc_dispatch() is the right
> place for it. mountd uses mount_dispatch() which has an exit via svcerr_auth(),
> that again sends a reply. So the timeout you suggest should be inserted in
> mount_dispatch(), I think.
> OTOH, a timeout will shorten the hang, but bad clients can still slow down mountd
> extremely.
> 
> BTW: AFAICS on Linux with libtirpc, using the control SVCGET_CONNMAXREC, the socket
> indirectly can set to non-blocking. That seems to result in write_vc() doing a max.
> 2 second loop of write() until it gives up.

Meanwhile I've found some time to do further investigations.

rpcbind uses the above mentioned rpc_control(SVCSET_CONNMAXREC) to switch
to nonblocking mode of libtirpc. So I tested a similar attack to rpcbind.
The nonblocking mode shows two positive effects:
- an attacker sending requests as fast as possible to rpcbind will have no
  success. As soon as rpcbind/libtirpc finds more than one request readable
  at the socket, it closes the connection.
- if the socket buffer is full, the write() fail with -EAGAIN. libtirpc
  uses a loop to retry the write for max. 2 seconds. Then it closes the
  connection.

Unfortunately the write retry loop in libtirpc has a bug. It increments
the length of and decrements the pointer to the retry buffer on each failed
write().
I've sent a patch to libtirpc-devel about 10 days ago, but didn't get a
response yet.

Regarding rpc.mountd, I've found, that using multiple processes (e.g. -t 4)
doesn't work well. When using libtirpc or when not using libtirpc but setting
-p xxxx option, the listening sockets (tcp listener and udp socket) are not
in non-blocking mode. Thus, if a single connection request comes in, all
threads wake up from the select(), but only one accept() succeeds. All other
threads will wait in accept() for further connection requests.
If a RPC-request comes in via UDP, what happens is very similar: all threads
wake up, one thread handles the request, all others wait in read() for
further UDP requests. 
As TCP connections are assigned to specific threads, all connections handled
by one thread will be block as long as the thread waits in accept() or read().
Thus, I've written two patches (see below), that set all listeners to
non-blocking in support/nfs/*

The third patch below inserts rpc_control(SVCSET_CONNMAXREC) into
nfs_svc_create()s in support/nfs/svc_create.c for the case of libtirpc.
That patch hardens rpc.mount against DOS attacks (and probably also statd,
as it also uses nfs_svc_create()).

The patches below are for nfs-util-1.3.1, but this version is untested!
(Couldn't build because of dependencies and now I'm running out of time)

My version of the patches for nfs-util-1.2.3-18.33.1 is tested on SLES11-SP3.
Please see the third patch as a RFC only. I'm not sure, whether setting
MAXREC might have negative side effects as I'm not a RPC expert.

Bodo

> 
> One other point: AFAICS on Linux with libtirpc the listening socket of mountd is
> in blocking mode. Would that be a problem when running multiple "threads"?
> The comment in svc_socket.c/svc_socket(), where the listening socket is set to
> non-blocking, sounds very reasonable. But AFAICS if libtirpc is used, O_NONBLOCK
> currently isn't set.
> 
> Bodo Stroesser
> 
> 
> > if the alarm fires while svc_sendreply is writing to the socket it should get
> > an error and close the connection.
> >
> > This would only fix mountd (as it is the only process to use rpc_dispatch).
> > Is a similar thing needed for statd I wonder??  It isn't so important.
> >
> > NeilBrown
> >
> > >
> > > Please CC me, I'm not on the list.
> > >
> > > Best regards,
> > > Bodo



---------------------------------------

From: Bodo Stroesser <bstroesser@xxxxxxxxxxxxxx>
Date: Thu, 09 Oct 2014 13:06:19 +0200
Subject: [PATCH] nfs-util: mountd: set nonblocking mode if no libtirpc

If mountd is built without libtirpc and it is started using "-p XXX" option,
the tcp listeners and the sockets waiting for UDP messages are not in
non-blocking mode. Thus if running with multiple threads (-t XX),
all threads will wake up from select on a connection request or a UDP message,
but only one thread will succeed. All others will wait on accept() or read()
for the next event.

Signed-off-by: Bodo Stroesser <bstroesser@xxxxxxxxxxxxxx>
---

--- nfs-utils-1.3.1/support/include/nfslib.h	2014-10-09 12:52:30.000000000 +0200
+++ nfs-utils-1.3.1/support/include/nfslib.h	2014-10-09 12:53:37.000000000 +0200
@@ -174,6 +174,7 @@ void closeall(int min);
 
 int			svctcp_socket (u_long __number, int __reuse);
 int			svcudp_socket (u_long __number);
+int			svcsock_nonblock (int __sock);
 
 /* Misc shared code prototypes */
 size_t  strlcat(char *, const char *, size_t);
--- nfs-utils-1.3.1/support/nfs/svc_socket.c	2014-10-09 12:56:14.000000000 +0200
+++ nfs-utils-1.3.1/support/nfs/svc_socket.c	2014-10-09 13:10:44.000000000 +0200
@@ -76,6 +76,39 @@ int getservport(u_long number, const cha
 	return 0;
 }
 
+int
+svcsock_nonblock(int sock)
+{
+	int flags;
+
+	if (sock < 0)
+		return sock;
+
+	/* This socket might be shared among multiple processes
+	 * if mountd is run multi-threaded.  So it is safest to
+	 * make it non-blocking, else all threads might wake
+	 * one will get the data, and the others will block
+	 * indefinitely.
+	 * In all cases, transaction on this socket are atomic
+	 * (accept for TCP, packet-read and packet-write for UDP)
+	 * so O_NONBLOCK will not confuse unprepared code causing
+	 * it to corrupt messages.
+	 * It generally safest to have O_NONBLOCK when doing an accept
+	 * as if we get a RST after the SYN and before accept runs,
+	 * we can block despite being told there was an acceptable
+	 * connection.
+	 */
+	if ((flags = fcntl(sock, F_GETFL)) < 0)
+		perror(_("svc_socket: can't get socket flags"));
+	else if (fcntl(sock, F_SETFL, flags|O_NONBLOCK) < 0)
+		perror(_("svc_socket: can't set socket flags"));
+	else
+		return sock;
+
+	(void) __close(sock);
+	return -1;
+}
+
 static int
 svc_socket (u_long number, int type, int protocol, int reuse)
 {
@@ -113,38 +146,7 @@ svc_socket (u_long number, int type, int
       sock = -1;
     }
 
-  if (sock >= 0)
-    {
-	    /* This socket might be shared among multiple processes
-	     * if mountd is run multi-threaded.  So it is safest to
-	     * make it non-blocking, else all threads might wake
-	     * one will get the data, and the others will block
-	     * indefinitely.
-	     * In all cases, transaction on this socket are atomic
-	     * (accept for TCP, packet-read and packet-write for UDP)
-	     * so O_NONBLOCK will not confuse unprepared code causing
-	     * it to corrupt messages.
-	     * It generally safest to have O_NONBLOCK when doing an accept
-	     * as if we get a RST after the SYN and before accept runs,
-	     * we can block despite being told there was an acceptable
-	     * connection.
-	     */
-	int flags;
-	if ((flags = fcntl(sock, F_GETFL)) < 0)
-	  {
-	      perror (_("svc_socket: can't get socket flags"));
-	      (void) __close (sock);
-	      sock = -1;
-	  }
-	else if (fcntl(sock, F_SETFL, flags|O_NONBLOCK) < 0)
-	  {
-	      perror (_("svc_socket: can't set socket flags"));
-	      (void) __close (sock);
-	      sock = -1;
-	  }
-    }
-
-  return sock;
+  return svcsock_nonblock(sock);
 }
 
 /*
--- nfs-utils-1.3.1/support/nfs/rpcmisc.c	2014-10-08 21:22:04.000000000 +0200
+++ nfs-utils-1.3.1/support/nfs/rpcmisc.c	2014-10-08 21:22:36.000000000 +0200
@@ -104,7 +104,7 @@ makesock(int port, int proto)
 		return -1;
 	}
 
-	return sock;
+	return svcsock_nonblock(sock);
 }
 
 void

--------------------------------------------------

From: Bodo Stroesser <bstroesser@xxxxxxxxxxxxxx>
Date: Thu, 09 Oct 2014 13:07:33 +0200
Subject: [PATCH] nfs-util: mountd: set nonblocking mode with libtirpc

If mountd is built with libtirpc the tcp listeners and the sockets
waiting for UDP messages are not in non-blocking mode. Thus if running
with multiple threads (-t XX), all threads will wake up from select on
a connection request or a UDP message, but only one thread will succeed.
All others will wait on accept() or read() for the next event.

Signed-off-by: Bodo Stroesser <bstroesser@xxxxxxxxxxxxxx>
---

--- nfs-utils-1.2.3/support/nfs/svc_create.c	2014-10-08 21:39:01.000000000 +0200
+++ nfs-utils-1.2.3/support/nfs/svc_create.c	2014-10-08 22:20:02.000000000 +0200
@@ -277,6 +277,12 @@
 			"(%s, %u, %s)", name, version, nconf->nc_netid);
 		return 0;
 	}
+	if (svcsock_nonblock(xprt->xp_fd) < 0) {
+		/* close() already done by svcsock_nonblock() */
+		xprt->xp_fd = RPC_ANYFD;
+                SVC_DESTROY(xprt);
+		return 0;
+	}
 
 	if (!svc_reg(xprt, program, version, dispatch, nconf)) {
 		/* svc_reg(3) destroys @xprt in this case */
@@ -332,6 +338,7 @@
 		int fd;
 
 		fd = svc_create_sock(ai->ai_addr, ai->ai_addrlen, nconf);
+		fd = svcsock_nonblock(fd);
 		if (fd == -1)
 			goto out_free;
 

--------------------------------------------------

From: Bodo Stroesser <bstroesser@xxxxxxxxxxxxxx>
Date: Thu, 09 Oct 2014 13:06:19 +0200
Subject: [PATCH] nfs-util: mountd: set libtirpc nonblocking mode to avoid DOS

This patch is experimental. In works fine in that it removes the vulnerability
against a DOS attack. rpc.mountd can be blocked by a bad client, that sends
many RPC requests by never reads the responses. This might happen intentionally
or caused by a wrong network config (MTU).
The patch switches on the nonblocking mode of libtirpc. In that mode writes can
block for a max. of 2 seconds. Attacker are forced to send requests slower, as
libtirpc will close a connection if it finds two requests to read at the same
time.
I do not know, whether setting MAXREC could cause trouble e.g. with big replies.
 

Signed-off-by: Bodo Stroesser <bstroesser@xxxxxxxxxxxxxx>
---

--- nfs-utils-1.2.3/support/nfs/svc_create.c	2014-10-09 12:09:15.000000000 +0200
+++ nfs-utils-1.2.3/support/nfs/svc_create.c	2014-10-09 12:13:32.000000000 +0200
@@ -49,6 +49,8 @@
 
 #ifdef HAVE_LIBTIRPC
 
+#include <rpc/rpc_com.h>
+
 #define SVC_CREATE_XPRT_CACHE_SIZE	(8)
 static SVCXPRT *svc_create_xprt_cache[SVC_CREATE_XPRT_CACHE_SIZE] = { NULL, };
 
@@ -401,6 +403,7 @@
 	const struct sigaction create_sigaction = {
 		.sa_handler	= SIG_IGN,
 	};
+	int maxrec = RPC_MAXDATASIZE;
 	unsigned int visible, up, servport;
 	struct netconfig *nconf;
 	void *handlep;
@@ -412,6 +415,20 @@
 	 */
 	(void)sigaction(SIGPIPE, &create_sigaction, NULL);
 
+	/*
+	 * Setting MAXREC also enables non-blocking mode for tcp connections.
+	 * This avoids DOS attacks by a client sending many requests but never
+	 * reading the reply:
+	 * - if a second request already is present for reading in the socket,
+	 *   after the first request just was read, libtirpc will break the
+	 *   connection. Thus an attacker can't simply send requests as fast as
+	 *   he can without waiting for the response.
+	 * - if the write buffer of the socket is full, the next write() will
+	 *   fail with EAGAIN. libtirpc will retry the write in a loop for max.
+	 *   2 seconds. If write still fails, the connection will be closed.
+	 */   
+	rpc_control(RPC_SVC_CONNMAXREC_SET, &maxrec);
+
 	handlep = setnetconfig();
 	if (handlep == NULL) {
 		xlog(L_ERROR, "Failed to access local netconfig database: %s",
ÿôèº{.nÇ+?·?®??+%?Ëÿ±éÝ¶¥?wÿº{.nÇ+?·¥?{±þwìþ)í?æèw*jg¬±¨¶????Ý¢jÿ¾«þG«?éÿ¢¸¢·¦j:+v?¨?wèjØm¶?ÿþø¯ù®w¥þ?àþf£¢·h??â?úÿ?Ù¥