Re: [PATCH/RFC] Don't try to recover NFS locks when they are lost.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 15 Aug 2013 08:47:06 -0400 Jeff Layton <jlayton@xxxxxxxxxx> wrote:

> On Thu, 15 Aug 2013 12:36:04 +1000
> NeilBrown <neilb@xxxxxxx> wrote:
> 
> > 
> > 
> > When an NFS (V4 specifically) client loses contact with the server it can
> > lose any locks that it holds.
> > Currently when it reconnects to the server it simply tries to reclaim
> > those locks.  This might succeed even though some other client has held and
> > released a lock in the mean time.  So the first client might think the file
> > is unchanged, but it isn't.  This isn't good.
> > 
> > If, when recovery happens, the locks cannot be claimed because some other
> > client still holds the lock, then  we get a message in the kernel logs, but
> > the client can still write.  So two clients can both think they have a lock
> > and can both write at the same time.  This is equally not good.
> > 
> > There was a patch a while ago
> >   http://comments.gmane.org/gmane.linux.nfs/41917
> > 
> > which tried to address some of this, but it didn't seem to go anywhere.
> > That patch would also send a signal to the process.  That might be useful
> > but I'm really just interested in failing the writes.
> > For NFSv4 (unlike v2/v3) there is a strong link between the lock and the
> > write request so we can fairly easily fail an IO of the lock is gone.
> > 
> > The patch below attempts to do this.  Does it make sense?
> > Because this is a fairly big change I introduces a module parameter
> > "recover_locks" which defaults to true (the current behaviour) but can be set
> > to "false" to tell the client not to try to recover things that were lost.
> > 
> > Comments?
> > 
> > Thanks,
> > NeilBrown
> > 
> > 
> 
> Failing a read or write when we can't recover a lock over the range
> seems reasonable to me. IIUC though, you're also saying that we
> shouldn't try to recover locks when the lease has expired? If so, then
> that seems wrong...
> 
> Isn't it the responsibility of the server to not allow a lock to be
> reclaimed when there has been a conflicting lock in the interim? It's
> quite possible (and even advantageous) for a server to hold onto a lock
> for a client that has missed renewing its lease when no other client has
> made a conflicting lock request.

Hi Jeff,
I had thought that too.  But when I looked I could find no evidence for it.
The only time a client can 'reclaim' a lock is during the grace period when
the server might have lost the lock due to a reboot.
The case I'm looking at is when neither host rebooted but there was a network
partition.
I think that if the server is to preserve the lock while no other client
contends it, it has to preserve the whole state and not return
NFS4ERR_EXPIRED.
Once the client gets NFS4ERR_EXPIRED it must assume that all related locks
may have been subject to conflicting locks from other clients.

Thanks,
NeilBrown


> 
> > 
> > diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
> > index f5c84c3..de0229b 100644
> > --- a/fs/nfs/nfs3proc.c
> > +++ b/fs/nfs/nfs3proc.c
> > @@ -826,9 +826,10 @@ static void nfs3_proc_read_setup(struct nfs_read_data *data, struct rpc_message
> >  	msg->rpc_proc = &nfs3_procedures[NFS3PROC_READ];
> >  }
> >  
> > -static void nfs3_proc_read_rpc_prepare(struct rpc_task *task, struct nfs_read_data *data)
> > +static int nfs3_proc_read_rpc_prepare(struct rpc_task *task, struct nfs_read_data *data)
> >  {
> >  	rpc_call_start(task);
> > +	return 0;
> >  }
> >  
> >  static int nfs3_write_done(struct rpc_task *task, struct nfs_write_data *data)
> > @@ -847,9 +848,10 @@ static void nfs3_proc_write_setup(struct nfs_write_data *data, struct rpc_messag
> >  	msg->rpc_proc = &nfs3_procedures[NFS3PROC_WRITE];
> >  }
> >  
> > -static void nfs3_proc_write_rpc_prepare(struct rpc_task *task, struct nfs_write_data *data)
> > +static int nfs3_proc_write_rpc_prepare(struct rpc_task *task, struct nfs_write_data *data)
> >  {
> >  	rpc_call_start(task);
> > +	return 0;
> >  }
> >  
> >  static void nfs3_proc_commit_rpc_prepare(struct rpc_task *task, struct nfs_commit_data *data)
> > diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
> > index ee81e35..a468b345 100644
> > --- a/fs/nfs/nfs4_fs.h
> > +++ b/fs/nfs/nfs4_fs.h
> > @@ -135,6 +135,7 @@ struct nfs4_lock_state {
> >  	struct list_head	ls_locks;	/* Other lock stateids */
> >  	struct nfs4_state *	ls_state;	/* Pointer to open state */
> >  #define NFS_LOCK_INITIALIZED 0
> > +#define NFS_LOCK_LOST        1
> >  	unsigned long		ls_flags;
> >  	struct nfs_seqid_counter	ls_seqid;
> >  	nfs4_stateid		ls_stateid;
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index cf11799..bcbcd07 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -3907,15 +3907,19 @@ static void nfs4_proc_read_setup(struct nfs_read_data *data, struct rpc_message
> >  	nfs41_init_sequence(&data->args.seq_args, &data->res.seq_res, 0);
> >  }
> >  
> > -static void nfs4_proc_read_rpc_prepare(struct rpc_task *task, struct nfs_read_data *data)
> > +static int nfs4_proc_read_rpc_prepare(struct rpc_task *task, struct nfs_read_data *data)
> >  {
> >  	if (nfs4_setup_sequence(NFS_SERVER(data->header->inode),
> >  			&data->args.seq_args,
> >  			&data->res.seq_res,
> >  			task))
> > -		return;
> > -	nfs4_set_rw_stateid(&data->args.stateid, data->args.context,
> > -			data->args.lock_context, FMODE_READ);
> > +		return 0;
> > +	if (nfs4_set_rw_stateid(&data->args.stateid, data->args.context,
> > +				data->args.lock_context, FMODE_READ) == -EIO)
> > +		return -EIO;
> > +	if (unlikely(test_bit(NFS_CONTEXT_BAD, &data->args.context->flags)))
> > +		return -EIO;
> > +	return 0;
> >  }
> >  
> >  static int nfs4_write_done_cb(struct rpc_task *task, struct nfs_write_data *data)
> > @@ -3990,15 +3994,19 @@ static void nfs4_proc_write_setup(struct nfs_write_data *data, struct rpc_messag
> >  	nfs41_init_sequence(&data->args.seq_args, &data->res.seq_res, 1);
> >  }
> >  
> > -static void nfs4_proc_write_rpc_prepare(struct rpc_task *task, struct nfs_write_data *data)
> > +static int nfs4_proc_write_rpc_prepare(struct rpc_task *task, struct nfs_write_data *data)
> >  {
> >  	if (nfs4_setup_sequence(NFS_SERVER(data->header->inode),
> >  			&data->args.seq_args,
> >  			&data->res.seq_res,
> >  			task))
> > -		return;
> > -	nfs4_set_rw_stateid(&data->args.stateid, data->args.context,
> > -			data->args.lock_context, FMODE_WRITE);
> > +		return 0;
> > +	if (nfs4_set_rw_stateid(&data->args.stateid, data->args.context,
> > +				data->args.lock_context, FMODE_WRITE) == -EIO)
> > +		return -EIO;
> > +	if (unlikely(test_bit(NFS_CONTEXT_BAD, &data->args.context->flags)))
> > +		return -EIO;
> > +	return 0;
> >  }
> >  
> >  static void nfs4_proc_commit_rpc_prepare(struct rpc_task *task, struct nfs_commit_data *data)
> > @@ -5380,6 +5388,11 @@ static int nfs4_lock_reclaim(struct nfs4_state *state, struct file_lock *request
> >  	return err;
> >  }
> >  
> > +bool recover_locks = true;
> > +module_param(recover_locks, bool, 0644);
> > +MODULE_PARM_DESC(recovery_locks,
> > +		 "If the server reports that a lock might be lost, "
> > +		 "try to recovery it risking corruption.");
> >  static int nfs4_lock_expired(struct nfs4_state *state, struct file_lock *request)
> >  {
> >  	struct nfs_server *server = NFS_SERVER(state->inode);
> > @@ -5391,6 +5404,10 @@ static int nfs4_lock_expired(struct nfs4_state *state, struct file_lock *request
> >  	err = nfs4_set_lock_state(state, request);
> >  	if (err != 0)
> >  		return err;
> > +	if (!recover_locks) {
> > +		set_bit(NFS_LOCK_LOST, &request->fl_u.nfs4_fl.owner->ls_flags);
> > +		return 0;
> > +	}
> >  	do {
> >  		if (test_bit(NFS_DELEGATED_STATE, &state->flags) != 0)
> >  			return 0;
> > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > index e22862f..4d103ff 100644
> > --- a/fs/nfs/nfs4state.c
> > +++ b/fs/nfs/nfs4state.c
> > @@ -998,7 +998,9 @@ static int nfs4_copy_lock_stateid(nfs4_stateid *dst,
> >  	fl_pid = lockowner->l_pid;
> >  	spin_lock(&state->state_lock);
> >  	lsp = __nfs4_find_lock_state(state, fl_owner, fl_pid, NFS4_ANY_LOCK_TYPE);
> > -	if (lsp != NULL && test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags) != 0) {
> > +	if (lsp && test_bit(NFS_LOCK_LOST, &lsp->ls_flags))
> > +		ret = -EIO;
> > +	else if (lsp != NULL && test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags) != 0) {
> >  		nfs4_stateid_copy(dst, &lsp->ls_stateid);
> >  		ret = 0;
> >  		smp_rmb();
> > diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
> > index c041c41..a8f57c7 100644
> > --- a/fs/nfs/proc.c
> > +++ b/fs/nfs/proc.c
> > @@ -623,9 +623,10 @@ static void nfs_proc_read_setup(struct nfs_read_data *data, struct rpc_message *
> >  	msg->rpc_proc = &nfs_procedures[NFSPROC_READ];
> >  }
> >  
> > -static void nfs_proc_read_rpc_prepare(struct rpc_task *task, struct nfs_read_data *data)
> > +static int nfs_proc_read_rpc_prepare(struct rpc_task *task, struct nfs_read_data *data)
> >  {
> >  	rpc_call_start(task);
> > +	return 0;
> >  }
> >  
> >  static int nfs_write_done(struct rpc_task *task, struct nfs_write_data *data)
> > @@ -644,9 +645,10 @@ static void nfs_proc_write_setup(struct nfs_write_data *data, struct rpc_message
> >  	msg->rpc_proc = &nfs_procedures[NFSPROC_WRITE];
> >  }
> >  
> > -static void nfs_proc_write_rpc_prepare(struct rpc_task *task, struct nfs_write_data *data)
> > +static int nfs_proc_write_rpc_prepare(struct rpc_task *task, struct nfs_write_data *data)
> >  {
> >  	rpc_call_start(task);
> > +	return 0;
> >  }
> >  
> >  static void nfs_proc_commit_rpc_prepare(struct rpc_task *task, struct nfs_commit_data *data)
> > diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> > index 70a26c6..31db5c3 100644
> > --- a/fs/nfs/read.c
> > +++ b/fs/nfs/read.c
> > @@ -513,9 +513,10 @@ static void nfs_readpage_release_common(void *calldata)
> >  void nfs_read_prepare(struct rpc_task *task, void *calldata)
> >  {
> >  	struct nfs_read_data *data = calldata;
> > -	NFS_PROTO(data->header->inode)->read_rpc_prepare(task, data);
> > -	if (unlikely(test_bit(NFS_CONTEXT_BAD, &data->args.context->flags)))
> > -		rpc_exit(task, -EIO);
> > +	int err;
> > +	err = NFS_PROTO(data->header->inode)->read_rpc_prepare(task, data);
> > +	if (err)
> > +		rpc_exit(task, err);
> >  }
> >  
> >  static const struct rpc_call_ops nfs_read_common_ops = {
> > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > index f1bdb72..7816801 100644
> > --- a/fs/nfs/write.c
> > +++ b/fs/nfs/write.c
> > @@ -1265,9 +1265,10 @@ EXPORT_SYMBOL_GPL(nfs_pageio_reset_write_mds);
> >  void nfs_write_prepare(struct rpc_task *task, void *calldata)
> >  {
> >  	struct nfs_write_data *data = calldata;
> > -	NFS_PROTO(data->header->inode)->write_rpc_prepare(task, data);
> > -	if (unlikely(test_bit(NFS_CONTEXT_BAD, &data->args.context->flags)))
> > -		rpc_exit(task, -EIO);
> > +	int err;
> > +	err = NFS_PROTO(data->header->inode)->write_rpc_prepare(task, data);
> > +	if (err)
> > +		rpc_exit(task, err);
> >  }
> >  
> >  void nfs_commit_prepare(struct rpc_task *task, void *calldata)
> > diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> > index 8651574..c71e12b 100644
> > --- a/include/linux/nfs_xdr.h
> > +++ b/include/linux/nfs_xdr.h
> > @@ -1419,12 +1419,12 @@ struct nfs_rpc_ops {
> >  	void	(*read_setup)   (struct nfs_read_data *, struct rpc_message *);
> >  	void	(*read_pageio_init)(struct nfs_pageio_descriptor *, struct inode *,
> >  				    const struct nfs_pgio_completion_ops *);
> > -	void	(*read_rpc_prepare)(struct rpc_task *, struct nfs_read_data *);
> > +	int	(*read_rpc_prepare)(struct rpc_task *, struct nfs_read_data *);
> >  	int	(*read_done)  (struct rpc_task *, struct nfs_read_data *);
> >  	void	(*write_setup)  (struct nfs_write_data *, struct rpc_message *);
> >  	void	(*write_pageio_init)(struct nfs_pageio_descriptor *, struct inode *, int,
> >  				     const struct nfs_pgio_completion_ops *);
> > -	void	(*write_rpc_prepare)(struct rpc_task *, struct nfs_write_data *);
> > +	int	(*write_rpc_prepare)(struct rpc_task *, struct nfs_write_data *);
> >  	int	(*write_done)  (struct rpc_task *, struct nfs_write_data *);
> >  	void	(*commit_setup) (struct nfs_commit_data *, struct rpc_message *);
> >  	void	(*commit_rpc_prepare)(struct rpc_task *, struct nfs_commit_data *);
> 
> 

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux