Re: [PATCH 2/8] staging/lustre/mdc: fix panic at mdc_free_open()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Actually, please do not apply this one, there was a testing error
that made me not noticing there's a bug in this one that insta-crashes everything on access.

I tested the rest nd the rest are good without this one too.

Sorry about this.

On Aug 23, 2016, at 5:11 PM, Oleg Drokin wrote:

> From: Alexander Boyko <alexander.boyko@xxxxxxxxxxx>
> 
> Assertion was happened for open request when rq_replay is set
> to 1.
>    ASSERTION(mod->mod_open_req->rq_replay == 0)
> But this situation is not fatal for client, and could happened
> when mdc_close() failed.
> The fix allow to free such requests. If mdc_close fail, MDS doesn`t
> receive close request from client. And in a worst case client would
> be evicted.
> 
> The test recreates issue when mdc_close failed and
> client asserts:
>   ASSERTION( mod->mod_open_req->rq_replay == 0 ) failed
> 
> Signed-off-by: Alexander Boyko <alexander.boyko@xxxxxxxxxxx>
> Seagate-bug-id: MRP-3156
> Reviewed-on: http://review.whamcloud.com/17495
> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5282
> Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@xxxxxxxxx>
> Reviewed-by: Andreas Dilger <andreas.dilger@xxxxxxxxx>
> Signed-off-by: Oleg Drokin <green@xxxxxxxxxxxxxx>
> ---
> .../staging/lustre/lustre/include/obd_support.h    |  1 +
> drivers/staging/lustre/lustre/mdc/mdc_request.c    | 50 ++++++++++++++--------
> 2 files changed, 32 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/obd_support.h b/drivers/staging/lustre/lustre/include/obd_support.h
> index 0c29a33..4a9fe88 100644
> --- a/drivers/staging/lustre/lustre/include/obd_support.h
> +++ b/drivers/staging/lustre/lustre/include/obd_support.h
> @@ -402,6 +402,7 @@ extern char obd_jobid_var[];
> #define OBD_FAIL_MDC_GETATTR_ENQUEUE     0x803
> #define OBD_FAIL_MDC_RPCS_SEM		 0x804
> #define OBD_FAIL_MDC_LIGHTWEIGHT	 0x805
> +#define OBD_FAIL_MDC_CLOSE		 0x806
> 
> #define OBD_FAIL_MGS		     0x900
> #define OBD_FAIL_MGS_ALL_REQUEST_NET     0x901
> diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
> index 91c0b45..8369afd 100644
> --- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
> +++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
> @@ -677,9 +677,15 @@ static void mdc_free_open(struct md_open_data *mod)
> 	    imp_connect_disp_stripe(mod->mod_open_req->rq_import))
> 		committed = 1;
> 
> -	LASSERT(mod->mod_open_req->rq_replay == 0);
> -
> -	DEBUG_REQ(D_RPCTRACE, mod->mod_open_req, "free open request\n");
> +	/*
> +	 * No reason to asssert here if the open request has
> +	 * rq_replay == 1. It means that mdc_close failed, and
> +	 * close request wasn`t sent. It is not fatal to client.
> +	 * The worst thing is eviction if the client gets open lock
> +	 */
> +	DEBUG_REQ(D_RPCTRACE, mod->mod_open_req,
> +		  "free open request rq_replay = %d\n",
> +		   mod->mod_open_req->rq_replay);
> 
> 	ptlrpc_request_committed(mod->mod_open_req, committed);
> 	if (mod->mod_close_req)
> @@ -749,22 +755,10 @@ static int mdc_close(struct obd_export *exp, struct md_op_data *op_data,
> 	}
> 
> 	*request = NULL;
> -	req = ptlrpc_request_alloc(class_exp2cliimp(exp), req_fmt);
> -	if (!req)
> -		return -ENOMEM;
> -
> -	rc = ptlrpc_request_pack(req, LUSTRE_MDS_VERSION, MDS_CLOSE);
> -	if (rc) {
> -		ptlrpc_request_free(req);
> -		return rc;
> -	}
> -
> -	/* To avoid a livelock (bug 7034), we need to send CLOSE RPCs to a
> -	 * portal whose threads are not taking any DLM locks and are therefore
> -	 * always progressing
> -	 */
> -	req->rq_request_portal = MDS_READPAGE_PORTAL;
> -	ptlrpc_at_set_req_timeout(req);
> +	if (OBD_FAIL_CHECK(OBD_FAIL_MDC_CLOSE))
> +		req = NULL;
> +	else
> +		req = ptlrpc_request_alloc(class_exp2cliimp(exp), req_fmt);
> 
> 	/* Ensure that this close's handle is fixed up during replay. */
> 	if (likely(mod)) {
> @@ -785,6 +779,23 @@ static int mdc_close(struct obd_export *exp, struct md_op_data *op_data,
> 		 CDEBUG(D_HA,
> 			"couldn't find open req; expecting close error\n");
> 	}
> +	if (!req) {
> +		/*
> +		 * TODO: repeat close after errors
> +		 */
> +		CWARN("%s: close of FID "DFID" failed, file reference will be dropped when this client unmounts or is evicted\n",
> +		      obd->obd_name, PFID(&op_data->op_fid1));
> +		rc = -ENOMEM;
> +		goto out;
> +	}
> +
> +	/*
> +	 * To avoid a livelock (bug 7034), we need to send CLOSE RPCs to a
> +	 * portal whose threads are not taking any DLM locks and are therefore
> +	 * always progressing
> +	 */
> +	req->rq_request_portal = MDS_READPAGE_PORTAL;
> +	ptlrpc_at_set_req_timeout(req);
> 
> 	mdc_close_pack(req, op_data);
> 
> @@ -830,6 +841,7 @@ static int mdc_close(struct obd_export *exp, struct md_op_data *op_data,
> 		}
> 	}
> 
> +out:
> 	if (mod) {
> 		if (rc != 0)
> 			mod->mod_close_req = NULL;
> -- 
> 2.7.4

_______________________________________________
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxx
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel



[Index of Archives]     [Linux Driver Backports]     [DMA Engine]     [Linux GPIO]     [Linux SPI]     [Video for Linux]     [Linux USB Devel]     [Linux Coverity]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]
  Powered by Linux