Re: [PATCH v4 1/2] SUNRPC: Fixup v4.1 backchannel request timeouts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 04, 2024 at 10:20:55AM -0500, Benjamin Coddington wrote:
> On 4 Jan 2024, at 10:09, Chuck Lever wrote:
> 
> > On Thu, Jan 04, 2024 at 09:58:45AM -0500, Benjamin Coddington wrote:
> >> After commit 59464b262ff5 ("SUNRPC: SOFTCONN tasks should time out when on
> >> the sending list"), any 4.1 backchannel tasks placed on the sending queue
> >                       ^^^
> >
> > "any" ? I found that this problem occurs only when the transport
> > write lock is held (ie, when the forechannel is sending a Call).
> > If the transport is idle, things work as expected. But OK, maybe
> > your reproducer is different than mine.
> 
> Any that are _placed on the sending queue_.

Ah, I misremembered: I thought all to-be-sent tasks were placed on
the sending queue. But no, only the ones that are put to sleep are.


> > One more comment below.
> >
> >
> >> would immediately return with -ETIMEDOUT since their req timers are zero.
> >>
> >> Initialize the backchannel's rpc_rqst timeout parameters from the xprt's
> >> default timeout settings.
> >>
> >> Fixes: 59464b262ff5 ("SUNRPC: SOFTCONN tasks should time out when on the sending list")
> >> Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx>
> >> ---
> >>  net/sunrpc/xprt.c | 23 ++++++++++++++---------
> >>  1 file changed, 14 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> >> index 2364c485540c..6cc9ffac962d 100644
> >> --- a/net/sunrpc/xprt.c
> >> +++ b/net/sunrpc/xprt.c
> >> @@ -651,9 +651,9 @@ static unsigned long xprt_abs_ktime_to_jiffies(ktime_t abstime)
> >>  		jiffies + nsecs_to_jiffies(-delta);
> >>  }
> >>
> >> -static unsigned long xprt_calc_majortimeo(struct rpc_rqst *req)
> >> +static unsigned long xprt_calc_majortimeo(struct rpc_rqst *req,
> >> +		const struct rpc_timeout *to)
> >>  {
> >> -	const struct rpc_timeout *to = req->rq_task->tk_client->cl_timeout;
> >>  	unsigned long majortimeo = req->rq_timeout;
> >>
> >>  	if (to->to_exponential)
> >> @@ -665,9 +665,10 @@ static unsigned long xprt_calc_majortimeo(struct rpc_rqst *req)
> >>  	return majortimeo;
> >>  }
> >>
> >> -static void xprt_reset_majortimeo(struct rpc_rqst *req)
> >> +static void xprt_reset_majortimeo(struct rpc_rqst *req,
> >> +		const struct rpc_timeout *to)
> >>  {
> >> -	req->rq_majortimeo += xprt_calc_majortimeo(req);
> >> +	req->rq_majortimeo += xprt_calc_majortimeo(req, to);
> >>  }
> >>
> >>  static void xprt_reset_minortimeo(struct rpc_rqst *req)
> >> @@ -675,7 +676,8 @@ static void xprt_reset_minortimeo(struct rpc_rqst *req)
> >>  	req->rq_minortimeo += req->rq_timeout;
> >>  }
> >>
> >> -static void xprt_init_majortimeo(struct rpc_task *task, struct rpc_rqst *req)
> >> +static void xprt_init_majortimeo(struct rpc_task *task, struct rpc_rqst *req,
> >> +		const struct rpc_timeout *to)
> >>  {
> >>  	unsigned long time_init;
> >>  	struct rpc_xprt *xprt = req->rq_xprt;
> >> @@ -684,8 +686,9 @@ static void xprt_init_majortimeo(struct rpc_task *task, struct rpc_rqst *req)
> >>  		time_init = jiffies;
> >>  	else
> >>  		time_init = xprt_abs_ktime_to_jiffies(task->tk_start);
> >> -	req->rq_timeout = task->tk_client->cl_timeout->to_initval;
> >> -	req->rq_majortimeo = time_init + xprt_calc_majortimeo(req);
> >> +
> >> +	req->rq_timeout = to->to_initval;
> >> +	req->rq_majortimeo = time_init + xprt_calc_majortimeo(req, to);
> >>  	req->rq_minortimeo = time_init + req->rq_timeout;
> >>  }
> >>
> >> @@ -713,7 +716,7 @@ int xprt_adjust_timeout(struct rpc_rqst *req)
> >>  	} else {
> >>  		req->rq_timeout = to->to_initval;
> >>  		req->rq_retries = 0;
> >> -		xprt_reset_majortimeo(req);
> >> +		xprt_reset_majortimeo(req, to);
> >>  		/* Reset the RTT counters == "slow start" */
> >>  		spin_lock(&xprt->transport_lock);
> >>  		rpc_init_rtt(req->rq_task->tk_client->cl_rtt, to->to_initval);
> >> @@ -1886,7 +1889,7 @@ xprt_request_init(struct rpc_task *task)
> >>  	req->rq_snd_buf.bvec = NULL;
> >>  	req->rq_rcv_buf.bvec = NULL;
> >>  	req->rq_release_snd_buf = NULL;
> >> -	xprt_init_majortimeo(task, req);
> >> +	xprt_init_majortimeo(task, req, task->tk_client->cl_timeout);
> >>
> >>  	trace_xprt_reserve(req);
> >>  }
> >> @@ -1996,6 +1999,8 @@ xprt_init_bc_request(struct rpc_rqst *req, struct rpc_task *task)
> >>  	 */
> >>  	xbufp->len = xbufp->head[0].iov_len + xbufp->page_len +
> >>  		xbufp->tail[0].iov_len;
> >> +
> >
> > +	/*
> > +	 * Backchannel Replies are sent with !RPC_TASK_SOFT and
> > +	 * RPC_TASK_NO_RETRANS_TIMEOUT. The major timeout setting
> > +	 * affects only how long each Reply waits to be sent when
> > +	 * a transport connection cannot be established.
> > +	 */
> 
> I put this on 2/2 like I said in my earlier response.  I've been trying not
> to make a delta on 1/2 (yes, even though its just a comment) because there's
> a nonzero chance a maintainer is currently testing it to fix 6.7.  I
> probably should not have made these two into a series, except that the 2nd
> depends on the 1st.
> 
> If you definitely want it here instead, I will send a v5.

Got it, I didn't realize 1/2 was immutable at this point.


> I think we're probably going to be stuck with a broken 6.7 at this
> point.

Well, 6.7.0 might have the bug, but unless I've missed something,
1/2 will get backported to 6.7.y pretty quickly, even if it goes
in during the 6.8 merge window.


-- 
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux