Re: [PATCH 1/4] svcrpc: never clear XPT_BUSY on dead xprt

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Fri, 12 Nov 2010 14:00:03 -0500

On Tue, Oct 26, 2010 at 12:05:17PM -0400, J. Bruce Fields wrote:
> On Tue, Oct 26, 2010 at 08:59:47AM -0400, J. Bruce Fields wrote:
> > On Tue, Oct 26, 2010 at 12:28:20PM +1100, Neil Brown wrote:
> > > On Mon, 25 Oct 2010 20:30:08 -0400
> > > "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote:
> > > > 
> > > >     svcrpc: simplify svc_close_xprt, fix minor race
> > > >     
> > > >     We're assuming that in the XPT_BUSY case, somebody else will take care
> > > >     of the close.  But that's not necessarily the case (e.g. if
> > > >     svc_xprt_enqueue() exits due to a lack of write space).  So the close
> > > >     could in theory be delayed indefinitely.  We should be calling
> > > >     svc_xprt_enqueue() after every set of XPT_CLOSE.
> > > >     
> > > >     Also with svc_close_all() no longer relying on svc_close, it no longer
> > > >     needs to be able to immediately do the svc_delete_xprt() itself.
> > > >     
> > > >     Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxx>
> > > > 
> > > > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> > > > index 8c018df..dfee123 100644
> > > > --- a/net/sunrpc/svc_xprt.c
> > > > +++ b/net/sunrpc/svc_xprt.c
> > > > @@ -928,11 +928,7 @@ void svc_delete_xprt(struct svc_xprt *xprt)
> > > >  void svc_close_xprt(struct svc_xprt *xprt)
> > > >  {
> > > >  	set_bit(XPT_CLOSE, &xprt->xpt_flags);
> > > > -	if (test_and_set_bit(XPT_BUSY, &xprt->xpt_flags))
> > > > -		/* someone else will have to effect the close */
> > > > -		return;
> > > > -
> > > > -	svc_delete_xprt(xprt);
> > > > +	svc_xprt_enqueue(xprt);
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(svc_close_xprt);
> > > >  
> > > 
> > > I'm not quite so sure about this one.
> > > 
> > > svc_close_xprt gets called when user-space writes some magic string to some
> > > magic file.  This could be done when there are no active nfsd threads.
> > > It is a bit far fetched, but you could tell nfsd to open a socket, then tell
> > > it to close it, before telling it to start any threads.
> > > 
> > > In that case the socket would stay open until the first thread was started.
> > 
> > Thanks, that's an odd case, but I think you're right.  OK:
> 
> Actually.... I think the real race here is the fault of the write-space
> check: once we've taken XPT_BUSY, we really do need to guarantee that
> we'll arrange to have svc_xprt_enqueue() called again after clearing
> XPT_BUSY, to avoid missing any events that arrived while we were
> processing ours.  So putting the wspace check after taking XPT_BUSY in
> svc_xprt_enqueue looks to me like a bug that could cause processing to
> stall unnecessarily in some unlikely cases.

I'm going to apply this for 2.6.38 if you can't think of a reason not
to.

(My one last-minute worry was whether there was some reason xpo_wspace()
needed to be called with XPT_BUSY held; but I can't see any.)

--b.

> 
> --b.
> 
> commit cf501cb479873d6e77abba131ea40c11aa924d72
> Author: J. Bruce Fields <bfields@xxxxxxxxxx>
> Date:   Tue Oct 26 11:32:03 2010 -0400
> 
>     svcrpc: fix wspace-checking race
>     
>     We call svc_xprt_enqueue() after something happens which we think may
>     require handling from a server thread.  To avoid such events being lost,
>     svc_xprt_enqueue() must guarantee that there will be a svc_serv() call
>     from a server thread following any such event.  It does that by either
>     waking up a server thread itself, or checking that XPT_BUSY is set (in
>     which case somebody else is doing it).
>     
>     But the check of XPT_BUSY could occur just as someone finishes
>     processing some other event, and just before they clear XPT_BUSY.
>     
>     Therefore it's important not to clear XPT_BUSY without subsequently
>     doing another svc_export_enqueue() to check whether the xprt should be
>     requeued.
>     
>     The xpo_wspace() check in svc_xprt_enqueue() breaks this rule, allowing
>     an event to be missed in situations like:
>     
>     	data arrives
>     	call svc_tcp_data_ready():
>     	call svc_xprt_enqueue():
>     	set BUSY
>     	find no write space
>     				svc_reserve():
>     				free up write space
>     				call svc_enqueue():
>     				test BUSY
>     	clear BUSY
>     
>     So, instead, check wspace in the same places that the state flags are
>     checked: before taking BUSY, and in svc_receive().
>     
>     Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxx>
> 
> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
> index d2261fe..9c690fa 100644
> --- a/include/linux/sunrpc/svc_xprt.h
> +++ b/include/linux/sunrpc/svc_xprt.h
> @@ -92,12 +92,17 @@ static inline void unregister_xpt_user(struct svc_xprt *xpt, struct svc_xpt_user
>  static inline int register_xpt_user(struct svc_xprt *xpt, struct svc_xpt_user *u)
>  {
>  	spin_lock(&xpt->xpt_lock);
> -	list_add(&u->list, &xpt->xpt_users);
> -	spin_unlock(&xpt->xpt_lock);
>  	if (test_bit(XPT_CLOSE, &xpt->xpt_flags)) {
> -		unregister_xpt_user(xpt, u);
> +		/*
> +		 * The connection is about to be deleted soon (or,
> +		 * worse, may already be deleted--in which case we've
> +		 * already notified the xpt_users).
> +		 */
> +		spin_unlock(&xpt->xpt_lock);
>  		return -ENOTCONN;
>  	}
> +	list_add(&u->list, &xpt->xpt_users);
> +	spin_unlock(&xpt->xpt_lock);
>  	return 0;
>  }
>  
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 99996ad..04238a9 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -302,6 +302,15 @@ static void svc_thread_dequeue(struct svc_pool *pool, struct svc_rqst *rqstp)
>  	list_del(&rqstp->rq_list);
>  }
>  
> +static bool svc_xprt_has_something_to_do(struct svc_xprt *xprt)
> +{
> +	if (xprt->xpt_flags & ((1<<XPT_CONN)|(1<<XPT_CLOSE)))
> +		return true;
> +	if (xprt->xpt_flags & ((1<<XPT_DATA)|(1<<XPT_DEFERRED)))
> +		return xprt->xpt_ops->xpo_has_wspace(xprt);
> +	return false;
> +}
> +
>  /*
>   * Queue up a transport with data pending. If there are idle nfsd
>   * processes, wake 'em up.
> @@ -314,8 +323,7 @@ void svc_xprt_enqueue(struct svc_xprt *xprt)
>  	struct svc_rqst	*rqstp;
>  	int cpu;
>  
> -	if (!(xprt->xpt_flags &
> -	      ((1<<XPT_CONN)|(1<<XPT_DATA)|(1<<XPT_CLOSE)|(1<<XPT_DEFERRED))))
> +	if (!svc_xprt_has_something_to_do(xprt))
>  		return;
>  
>  	cpu = get_cpu();
> @@ -345,25 +353,6 @@ void svc_xprt_enqueue(struct svc_xprt *xprt)
>  	BUG_ON(xprt->xpt_pool != NULL);
>  	xprt->xpt_pool = pool;
>  
> -	/* Handle pending connection */
> -	if (test_bit(XPT_CONN, &xprt->xpt_flags))
> -		goto process;
> -
> -	/* Handle close in-progress */
> -	if (test_bit(XPT_CLOSE, &xprt->xpt_flags))
> -		goto process;
> -
> -	/* Check if we have space to reply to a request */
> -	if (!xprt->xpt_ops->xpo_has_wspace(xprt)) {
> -		/* Don't enqueue while not enough space for reply */
> -		dprintk("svc: no write space, transport %p  not enqueued\n",
> -			xprt);
> -		xprt->xpt_pool = NULL;
> -		clear_bit(XPT_BUSY, &xprt->xpt_flags);
> -		goto out_unlock;
> -	}
> -
> - process:
>  	if (!list_empty(&pool->sp_threads)) {
>  		rqstp = list_entry(pool->sp_threads.next,
>  				   struct svc_rqst,
> @@ -744,7 +733,7 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
>  			spin_unlock_bh(&serv->sv_lock);
>  			svc_xprt_received(newxpt);
>  		}
> -	} else {
> +	} else if (xprt->xpt_ops->xpo_has_wspace(xprt)) {
>  		dprintk("svc: server %p, pool %u, transport %p, inuse=%d\n",
>  			rqstp, pool->sp_id, xprt,
>  			atomic_read(&xprt->xpt_ref.refcount));
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html