Re: The recent kref_put warning (was: [PATCH] sunrpc: remove unnecessary svc_xprt_put)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Feb 27, 2010 at 04:59:13PM +1100, Neil Brown wrote:
> On Sat, 27 Feb 2010 12:35:37 +1100
> Neil Brown <neilb@xxxxxxx> wrote:
> 
> >  So if XPT_CLOSE is set while xpo_recvfrom is being called, which I think
> >  is possible, and if ->xpo_recvfrom returns non-zero, then we end up
> >  processing a request on a dead socket, which doesn't sound like the right
> >  thing to do.  I don't think it can cause the present problem, but
> >  it looks wrong.  That last 'if' should just be an 'else'.
> >  I guess that would effectively reverse b0401d7253, though - not that
> >  that patch seems entirely right to me - if there is a problem I probably
> >  would have fixed it differently, though I'm not sure how.
> >  So maybe change "if (XPT_CLOSE)" to "if (len <= 0 && XPT_CLOSE)" ???
> 
> OK, I think I've nailed it.  I think  b0401d7253 is the culprit.
> Now let me see if I can convince you (and me).

Thanks, Neil, for the explanation.

> Firstly, why is this patch wrong.
> It claims:
>     
>     sunrpc: "Move close processing to a single place"
>     (d7979ae4a050a45b78af51832475001b68263d2a) moved the
>     close processing before the recvfrom method. This may
>     cause the close processing never to execute. So this
>     patch moves it to the right place.
>     
> The referenced commit did *not* move the close processing before the
> recvfrom method - it was already there.  The close processing was previously
> at the top of the individual recvfrom methods.  It was split out and common
> code with placed before the now-slightly-reduced recvfrom methods.
> This is functionally a null change.
> 
> However that doesn't explain why sometimes "the close processing [would]]
> never .. execute".
> The reason for this is subtle.  One the changes in commit d7979ae4a is
> 
>   err_delete:
> -       svc_delete_socket(svsk);
> +       set_bit(SK_CLOSE, &svsk->sk_flags);
>         return -EAGAIN;
> 
> This is insufficient. The recvfrom methods must always call svc_xprt_received
> on completion so that the socket gets re-queued if there is any more work to
> do.  This particular path did not make that call because it actually
> destroyed the svsk, making requeue pointless.  When the svc_delete_socket was
> change to just set a bit, we should have added a call to svc_xprt_received,
> but we didn't.  Sorry.  As I said, it was subtle.
> 
> So how is the b0401d7253 patch causing a problem?
> 
> svc_tcp_state_change - which can be called at any time - sets XPT_CLOSE.
> If this happens while svc_tcp_recvfrom is running and before one of the
> calls to svc_xprt_received, then svc_xprt_received will requeue the svsk for
> further processing (to handle the close).
> As soon a svc_tcp_recvfrom completes, svc_recv will notice that XPT_CLOSE is
> set and will close the socket, dropping the last refcount.

OK, so the rule (or one of the rules) that was violated here was that we
end up calling svc_delete_xprt() without XPT_BUSY?

> Subsequently the
> thread which the socket was queued to wakes up, calls svc_recv, and triggers
> the warning.
> 
> So the fix I propose is:
>  - make the XPT_CLOSE case in svc_recv once more exclusive with the
>    ->recvfrom case
>  - make sure all paths out of all recvfrom methods call svc_xprt_received.
>    Maybe it should be called after the ->xpo_recvfrom call instead.
> 
> So something like this?

Makes sense to me on a first look.  It would be helpful if people that
can reproduce this could test....

> I've made quite a few changes here - it might be worth splitting them
> up.

Probably so.

> One worth noting is that we now don't re-queue a udp
> socket at the earliest opportunity, but possibly do a
> csum_partial_copy_to_xdr before the requeue which could reduce performance
> slightly with udp on a multiprocessor.

Just because we're slower to get another CPU working on the next
incoming packet?

> I have no idea what the actual
> performance effect would be, but I think it makes the code a lot more robust
> (move the svc_xprt_received to just one place).

OK.

--b.

> 
> NeilBrown
> 
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 4f30336..2d99fb8 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -699,8 +699,12 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
>  	spin_unlock_bh(&pool->sp_lock);
>  
>  	len = 0;
> -	if (test_bit(XPT_LISTENER, &xprt->xpt_flags) &&
> -	    !test_bit(XPT_CLOSE, &xprt->xpt_flags)) {
> +
> +	if (test_bit(XPT_CLOSE, &xprt->xpt_flags)) {
> +		dprintk("svc_recv: found XPT_CLOSE\n");
> +		svc_delete_xprt(xprt);
> +		return 0;
> +	} else if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) {
>  		struct svc_xprt *newxpt;
>  		newxpt = xprt->xpt_ops->xpo_accept(xprt);
>  		if (newxpt) {
> @@ -726,23 +730,19 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
>  			svc_xprt_received(newxpt);
>  		}
>  		svc_xprt_received(xprt);
> -	} else if (!test_bit(XPT_CLOSE, &xprt->xpt_flags)) {
> -		dprintk("svc: server %p, pool %u, transport %p, inuse=%d\n",
> -			rqstp, pool->sp_id, xprt,
> -			atomic_read(&xprt->xpt_ref.refcount));
> -		rqstp->rq_deferred = svc_deferred_dequeue(xprt);
> -		if (rqstp->rq_deferred) {
> -			svc_xprt_received(xprt);
> -			len = svc_deferred_recv(rqstp);
> -		} else
> -			len = xprt->xpt_ops->xpo_recvfrom(rqstp);
> -		dprintk("svc: got len=%d\n", len);
> +		return 0;
>  	}
>  
> -	if (test_bit(XPT_CLOSE, &xprt->xpt_flags)) {
> -		dprintk("svc_recv: found XPT_CLOSE\n");
> -		svc_delete_xprt(xprt);
> -	}
> +	dprintk("svc: server %p, pool %u, transport %p, inuse=%d\n",
> +		rqstp, pool->sp_id, xprt,
> +		atomic_read(&xprt->xpt_ref.refcount));
> +	rqstp->rq_deferred = svc_deferred_dequeue(xprt);
> +	if (rqstp->rq_deferred)
> +		len = svc_deferred_recv(rqstp);
> +	else
> +		len = xprt->xpt_ops->xpo_recvfrom(rqstp);
> +	dprintk("svc: got len=%d\n", len);
> +	svc_xprt_received(xprt);
>  
>  	/* No data, incomplete (TCP) read, or accept() */
>  	if (len == 0 || len == -EAGAIN) {
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 870929e..22d9904 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -547,7 +547,6 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
>  			dprintk("svc: recvfrom returned error %d\n", -err);
>  			set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
>  		}
> -		svc_xprt_received(&svsk->sk_xprt);
>  		return -EAGAIN;
>  	}
>  	len = svc_addr_len(svc_addr(rqstp));
> @@ -562,11 +561,6 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
>  	svsk->sk_sk->sk_stamp = skb->tstamp;
>  	set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); /* there may be more data... */
>  
> -	/*
> -	 * Maybe more packets - kick another thread ASAP.
> -	 */
> -	svc_xprt_received(&svsk->sk_xprt);
> -
>  	len  = skb->len - sizeof(struct udphdr);
>  	rqstp->rq_arg.len = len;
>  
> @@ -917,7 +911,6 @@ static int svc_tcp_recv_record(struct svc_sock *svsk, struct svc_rqst *rqstp)
>  		if (len < want) {
>  			dprintk("svc: short recvfrom while reading record "
>  				"length (%d of %d)\n", len, want);
> -			svc_xprt_received(&svsk->sk_xprt);
>  			goto err_again; /* record header not complete */
>  		}
>  
> @@ -953,7 +946,6 @@ static int svc_tcp_recv_record(struct svc_sock *svsk, struct svc_rqst *rqstp)
>  	if (len < svsk->sk_reclen) {
>  		dprintk("svc: incomplete TCP record (%d of %d)\n",
>  			len, svsk->sk_reclen);
> -		svc_xprt_received(&svsk->sk_xprt);
>  		goto err_again;	/* record not complete */
>  	}
>  	len = svsk->sk_reclen;
> @@ -961,10 +953,9 @@ static int svc_tcp_recv_record(struct svc_sock *svsk, struct svc_rqst *rqstp)
>  
>  	return len;
>   error:
> -	if (len == -EAGAIN) {
> +	if (len == -EAGAIN)
>  		dprintk("RPC: TCP recv_record got EAGAIN\n");
> -		svc_xprt_received(&svsk->sk_xprt);
> -	}
> +
>  	return len;
>   err_delete:
>  	set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> @@ -1109,18 +1100,14 @@ out:
>  	svsk->sk_tcplen = 0;
>  
>  	svc_xprt_copy_addrs(rqstp, &svsk->sk_xprt);
> -	svc_xprt_received(&svsk->sk_xprt);
>  	if (serv->sv_stats)
>  		serv->sv_stats->nettcpcnt++;
>  
>  	return len;
>  
>  err_again:
> -	if (len == -EAGAIN) {
> +	if (len == -EAGAIN)
>  		dprintk("RPC: TCP recvfrom got EAGAIN\n");
> -		svc_xprt_received(&svsk->sk_xprt);
> -		return len;
> -	}
>  error:
>  	if (len != -EAGAIN) {
>  		printk(KERN_NOTICE "%s: recvfrom returned errno %d\n",
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> index f92e37e..0194de8 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -566,7 +566,6 @@ static int rdma_read_complete(struct svc_rqst *rqstp,
>  		ret, rqstp->rq_arg.len,	rqstp->rq_arg.head[0].iov_base,
>  		rqstp->rq_arg.head[0].iov_len);
>  
> -	svc_xprt_received(rqstp->rq_xprt);
>  	return ret;
>  }
>  
> @@ -665,7 +664,6 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
>  		rqstp->rq_arg.head[0].iov_len);
>  	rqstp->rq_prot = IPPROTO_MAX;
>  	svc_xprt_copy_addrs(rqstp, xprt);
> -	svc_xprt_received(xprt);
>  	return ret;
>  
>   close_out:
> @@ -678,6 +676,5 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
>  	 */
>  	set_bit(XPT_CLOSE, &xprt->xpt_flags);
>  defer:
> -	svc_xprt_received(xprt);
>  	return 0;
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux