(2011/08/05 22:28), Frank van Maarseveen wrote: > On Thu, Aug 04, 2011 at 02:17:35PM -0400, Trond Myklebust wrote: >> On Thu, 2011-08-04 at 19:27 +0200, Frank van Maarseveen wrote: >> > On Thu, Aug 04, 2011 at 01:10:20PM -0400, Trond Myklebust wrote: >> > > On Thu, 2011-08-04 at 12:49 -0400, J. Bruce Fields wrote: >> > > > On Thu, Aug 04, 2011 at 06:43:13PM +0200, Frank van Maarseveen wrote: >> > > > > On Thu, Aug 04, 2011 at 12:34:52PM -0400, J. Bruce Fields wrote: >> > > > > > On Thu, Aug 04, 2011 at 12:30:19PM +0200, Frank van Maarseveen wrote: >> > > > > > > Both client- and server run 2.6.39.3, NFSv3 over UDP (without the >> > > > > > > relock_filesystem patch proposed earlier). >> > > > > > > >> > > > > > > A second client has an exclusive lock on a file on the server. The >> > > > > > > client under test calls fcntl(F_SETLKW) to wait for the same exclusive >> > > > > > > lock. Wireshark sees NLM V4 LOCK calls resulting in NLM_BLOCKED. >> > > > > > > >> > > > > > > Next the server is rebooted. The second client recovers the lock >> > > > > > > correctly. The client under test now receives NLM_DENIED_GRACE_PERIOD for >> > > > > > > every NLM V4 LOCK request resulting from the waiting fcntl(F_SETLKW). When >> > > > > > > this changes to NLM_BLOCKED after grace period expiration the fcntl >> > > > > > > returns -ENOLCK ("No locks available.") instead of continuing to wait. >> > > > > > >> > > > > > So that sounds like a client bug, and correct behavior from the server >> > > > > > (assuming the second client was still holding the lock throughout). >> > > > > >> > > > > yes. >> > > >> > > Is the client actually asking for a blocking lock after the grace period >> > > expires? >> > >> > yes, according to my interpretation of that of wireshark, see reply to Bruce. >> > >> >> OK... Does the following patch help? >> >> Cheers >> Trond >> --- >> diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c >> index 8392cb8..40c0d88 100644 >> --- a/fs/lockd/clntproc.c >> +++ b/fs/lockd/clntproc.c >> @@ -270,6 +270,9 @@ nlmclnt_call(struct rpc_cred *cred, struct nlm_rqst *req, u32 proc) >> return -ENOLCK; >> msg.rpc_proc = &clnt->cl_procinfo[proc]; >> >> + /* Reset the reply status */ >> + if (argp->block) >> + resp->status = nlm_lck_blocked; >> /* Perform the RPC call. If an error occurs, try again */ >> if ((status = rpc_call_sync(clnt, &msg, 0)) < 0) { >> dprintk("lockd: rpc_call returned error %d\n", -status); >> > > Negative. I've tried it on the client under test and I'm seeing three > types of behavior, one good, two bad. In all cases the secondary > client (unmodified) correctly regains the lock after the server has > rebooted. Client under test behavior depends on whether it had queued > the conflicting lock before of after the server reboot. Afterwards it > seems to work with the above modification (don't know if that was the > case before though). > > When the client under test tries to lock before the server reboot then > the fcntl(F_SETLKW) returns either right after the NSM NOTIFY with > -ENOLCK without any NLM trafic or it returns with -ENOLCK when the > NLM_DENIED_GRACE_PERIOD changes into NLM_BLOCKED (the original report). > Hi all Was this fixed? I have same issue in 3.2.9-2.fc16. When the client recieves NSM NOTIFY, reclaimer() thread updates block->b_status to nlm_lck_denied_grace_period. fs/lockd/clntlock.c 265 /* Now, wake up all processes that sleep on a blocked lock */ 266 spin_lock(&nlm_blocked_lock); 267 list_for_each_entry(block, &nlm_blocked, b_list) { 268 if (block->b_host == host) { * 269 block->b_status = nlm_lck_denied_grace_period; 270 wake_up(&block->b_wait); 271 } 272 } 273 spin_unlock(&nlm_blocked_lock); Blocked process loops inside nlmclnt_call() during grace period, and recieves NLM_BLOCKED again. Then nlmclnt_block() copies block->b_status(== nlm_lck_denied_grace_period) to req->a_res.status. fs/lockd/clntlock.c 139 ret = wait_event_interruptible_timeout(block->b_wait, 140 block->b_status != nlm_lck_blocked, 141 timeout); 142 if (ret < 0) 143 return -ERESTARTSYS; * 144 req->a_res.status = block->b_status; 145 return 0; .. and nlmclnt_lock() breaks retry loop and returns -ENOLCK. fs/lockd/clntproc.c 550 /* Wait on an NLM blocking lock */ 551 status = nlmclnt_block(block, req, NLMCLNT_POLL_TIMEOUT); 552 if (status < 0) 553 break; * 554 if (resp->status != nlm_lck_blocked) * 555 break; 556 } ... 590 if (resp->status == nlm_lck_denied && (fl_flags & FL_SLEEP)) 591 status = -ENOLCK; 592 else * 593 status = nlm_stat_to_errno(resp->status); 594out_unblock: 595 nlmclnt_finish_block(block); 596out: 597 nlmclnt_release_call(req); * 598 return status; Following patch works fine in my fc16. --- a/fs/lockd/clntlock.c 2012-01-04 23:55:44.000000000 +0000 +++ b/fs/lockd/clntlock.c 2012-03-16 08:08:03.793687409 +0000 @@ -121,6 +121,7 @@ int nlmclnt_block(struct nlm_wait *block, struct nlm_rqst *req, long timeout) { long ret; + u32 nsmstate; /* A borken server might ask us to block even if we didn't * request it. Just say no! @@ -136,8 +137,10 @@ * a 1 minute timeout would do. See the comment before * nlmclnt_lock for an explanation. */ + nsmstate = block->b_host->h_nsmstate; ret = wait_event_interruptible_timeout(block->b_wait, - block->b_status != nlm_lck_blocked, + block->b_status != nlm_lck_blocked || + block->b_host->h_nsmstate != nsmstate, timeout); if (ret < 0) return -ERESTARTSYS; @@ -266,7 +269,6 @@ spin_lock(&nlm_blocked_lock); list_for_each_entry(block, &nlm_blocked, b_list) { if (block->b_host == host) { - block->b_status = nlm_lck_denied_grace_period; wake_up(&block->b_wait); } } Thanks, Ichiko
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature