On Tue, Aug 23, 2011 at 11:06:41AM +1200, Malcolm Locke wrote: > On Mon, Aug 15, 2011 at 09:21:30AM -0400, J. Bruce Fields wrote: > > On Tue, Aug 09, 2011 at 12:51:14AM +1200, Malcolm Locke wrote: > > > First off, apologies for bringing such mundane matters to the list, but > > > we're at the end of our tethers and way out of our depth on this. We > > > have a problem on our production machine that we are unable to replicate > > > on a test machine, and would greatly appreciate any pointers of where to > > > look next. > > > > > > We're in the process of upgrading a DRBD pair running Ubuntu hardy to > > > Debian squeeze. The first of the pair has been upgraded, and NFS works > > > correctly except for locking. Calls to flock() from any client on an > > > NFS mount hang indefinitely. > > > > > > We've installed a fresh Debian squeeze machine to test, but are > > > completely unable to reproduce the issue. > > OK, I've finally managed to reproduce this on our test machine. Given > the package list below: > > > > Pertinent details about the > > > set up: > > > > > > Kernel on both machines: > > > Linux debian 2.6.32-5-openvz-amd64 #1 SMP Tue Jun 14 10:46:15 UTC 2011 > > > x86_64 GNU/Linux > > > > > > Debian package versions: > > > nfs-common 1.2.2-4 > > > nfs-kernel-server 1.2.2-4 > > > rpcbind 0.2.0-4.1 > > And the following /etc/exports: > > /home 192.168.200.0/24(rw,no_root_squash,async,no_subtree_check) > /nfs4 192.168.200.0/24(rw,sync,fsid=0,crossmnt) > /nfs4/flum 192.168.200.0/24(rw,sync) > > After a fresh boot: > > # Just mount and unmount a v4 mount (192.168.200.187 == localhost) > $ mount -t nfs4 192.168.200.187:/flum /mnt > $ umount /mnt > > $ /etc/init.d/nfs-kernel-server stop > # Comment out the v4 entries from /etc/exports, so only /home remains, > # and restart the server so v4 is disabled. > $ /etc/init.d/nfs-kernel-server start > > # Mount with v3 > $ mount 192.168.200.187:/home /mnt > > # Now trying to flock() will fail, with server staying in grace period > # ad infinitum > $ flock /mnt/foo ls > > I'm not sure if this is the exact sequence of events we had to get > things stuck on our production machine (it's possible), but this > sequence will always get the server into indefinite grace period for me. > > > > > It might be worth trying this in addition to the recoverydir fixes > > previously posted. > > Thanks, I haven't had the opportunity to try this yet but will do so on > the test machine and report back if I get time. Have you gotten a chance to try this? --b. > > > commit c52560f10794b9fb8c050532d27ff999d8f5c23c > > Author: J. Bruce Fields <bfields@xxxxxxxxxx> > > Date: Fri Aug 12 11:59:44 2011 -0400 > > > > some grace period fixes and debugging > > > > diff --git a/fs/lockd/grace.c b/fs/lockd/grace.c > > index 183cc1f..61272f7 100644 > > --- a/fs/lockd/grace.c > > +++ b/fs/lockd/grace.c > > @@ -22,6 +22,7 @@ static DEFINE_SPINLOCK(grace_lock); > > void locks_start_grace(struct lock_manager *lm) > > { > > spin_lock(&grace_lock); > > + printk("lm->name starting grace period\n"); > > list_add(&lm->list, &grace_list); > > spin_unlock(&grace_lock); > > } > > @@ -40,6 +41,7 @@ EXPORT_SYMBOL_GPL(locks_start_grace); > > void locks_end_grace(struct lock_manager *lm) > > { > > spin_lock(&grace_lock); > > + printk("%s ending grace period\n", lm->name); > > list_del_init(&lm->list); > > spin_unlock(&grace_lock); > > } > > @@ -54,6 +56,15 @@ EXPORT_SYMBOL_GPL(locks_end_grace); > > */ > > int locks_in_grace(void) > > { > > - return !list_empty(&grace_list); > > + if (!list_empty(&grace_list)) { > > + struct lock_manager *lm; > > + > > + printk("in grace period due to: "); > > + list_for_each_entry(lm, &grace_list, list) > > + printk("%s ",lm->name); > > + printk("\n"); > > + return 1; > > + } > > + return 0; > > } > > EXPORT_SYMBOL_GPL(locks_in_grace); > > diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c > > index c061b9a..1638929 100644 > > --- a/fs/lockd/svc.c > > +++ b/fs/lockd/svc.c > > @@ -84,6 +84,7 @@ static unsigned long get_lockd_grace_period(void) > > } > > > > static struct lock_manager lockd_manager = { > > + .name = "lockd" > > }; > > > > static void grace_ender(struct work_struct *not_used) > > @@ -97,8 +98,8 @@ static void set_grace_period(void) > > { > > unsigned long grace_period = get_lockd_grace_period(); > > > > - locks_start_grace(&lockd_manager); > > cancel_delayed_work_sync(&grace_period_end); > > + locks_start_grace(&lockd_manager); > > schedule_delayed_work(&grace_period_end, grace_period); > > } > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > index 3787ec1..b83ffdf 100644 > > --- a/fs/nfsd/nfs4state.c > > +++ b/fs/nfsd/nfs4state.c > > @@ -2942,6 +2942,7 @@ out: > > } > > > > static struct lock_manager nfsd4_manager = { > > + .name = "nfsd4", > > }; > > > > static void > > @@ -4563,7 +4564,6 @@ __nfs4_state_start(void) > > int ret; > > > > boot_time = get_seconds(); > > - locks_start_grace(&nfsd4_manager); > > printk(KERN_INFO "NFSD: starting %ld-second grace period\n", > > nfsd4_grace); > > ret = set_callback_cred(); > > @@ -4575,6 +4575,7 @@ __nfs4_state_start(void) > > ret = nfsd4_create_callback_queue(); > > if (ret) > > goto out_free_laundry; > > + locks_start_grace(&nfsd4_manager); > > queue_delayed_work(laundry_wq, &laundromat_work, nfsd4_grace * HZ); > > set_max_delegations(); > > return 0; > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index ad35091..9501aa7 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -1098,6 +1098,7 @@ struct lock_manager_operations { > > }; > > > > struct lock_manager { > > + char *name; > > struct list_head list; > > }; > > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html