Re: error=Invalid slot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Apr 15, 2019, at 12:05 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> 
> Hi Chuck,
> 
> 
> On Mon, 2019-04-15 at 11:04 -0400, Chuck Lever wrote:
>> Just happened again. Any thoughts about where I should start looking?
>> 
>> Mon Apr 15 11:01:40 EDT 2019
>> 4k100test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B,
>> (T) 4096B-4096B, ioengine=libaio, iodepth=1024
>> ...
>> fio-3.1
>> Starting 12 processes
>> 4k100test: Laying out IO file (1 file / 1024MiB)
>> fio: native_fallocate call failed: Operation not supported
>> 4k100test: Laying out IO file (1 file / 1024MiB)
>> fio: native_fallocate call failed: Operation not supported
>> 4k100test: Laying out IO file (1 file / 1024MiB)
>> fio: native_fallocate call failed: Operation not supported
>> 4k100test: Laying out IO file (1 file / 1024MiB)
>> fio: native_fallocate call failed: Operation not supported
>> 4k100test: Laying out IO file (1 file / 1024MiB)
>> fio: native_fallocate call failed: Operation not supported
>> 4k100test: Laying out IO file (1 file / 1024MiB)
>> fio: native_fallocate call failed: Operation not supported
>> 4k100test: Laying out IO file (1 file / 1024MiB)
>> fio: native_fallocate call failed: Operation not supported
>> 4k100test: Laying out IO file (1 file / 1024MiB)
>> fio: native_fallocate call failed: Operation not supported
>> 4k100test: Laying out IO file (1 file / 1024MiB)
>> fio: native_fallocate call failed: Operation not supported
>> 4k100test: Laying out IO file (1 file / 1024MiB)
>> fio: native_fallocate call failed: Operation not supported
>> 4k100test: Laying out IO file (1 file / 1024MiB)
>> fio: native_fallocate call failed: Operation not supported
>> 4k100test: Laying out IO file (1 file / 1024MiB)
>> fio: native_fallocate call failed: Operation not supported
>> fio: io_u error on file 4k100test.7.0: Invalid slot: read
>> offset=938229760, buflen=4096
> 
> Does the following patch fix the race?
> 
> 8<--------------------------------------
> From 4c8759eafad9bb7ea2626a53296e30618aeefcc7 Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> Date: Mon, 15 Apr 2019 11:54:13 -0400
> Subject: [PATCH] SUNRPC: Ignore queue transmission errors on successful
> transmission
> 
> If a request transmission fails due to write space or slot unavailability
> errors, but the queued task then gets transmitted before it has time to
> process the error in call_transmit_status() or call_bc_transmit_status(),
> we need to suppress the transmission error code to prevent it from leaking
> out of the RPC layer.
> 
> Reported-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> ---
> net/sunrpc/clnt.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index fa900bb44cd5..369a2648dafc 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -2101,8 +2101,8 @@ call_transmit_status(struct rpc_task *task)
> 	 * test first.
> 	 */
> 	if (rpc_task_transmitted(task)) {
> -		if (task->tk_status == 0)
> -			xprt_request_wait_receive(task);
> +		task->tk_status = 0;
> +		xprt_request_wait_receive(task);
> 		return;
> 	}
> 
> @@ -2187,6 +2187,9 @@ call_bc_transmit_status(struct rpc_task *task)
> {
> 	struct rpc_rqst *req = task->tk_rqstp;
> 
> +	if (rpc_task_transmitted(task))
> +		task->tk_status = 0;
> +
> 	dprint_status(task);
> 
> 	switch (task->tk_status) {
> -- 
> 2.20.1
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@xxxxxxxxxxxxxxx

I haven't been able to reproduce the problem.

Tested-by: Chuck Lever <chuck.lever@xxxxxxxxxx>


--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux