Re: [Bug 30882] Automatic process group scheduling causes crashes after a while

Ian Kent <raven@xxxxxxxxxx> · Wed, 16 Mar 2011 23:44:39 +0800

On Wed, 2011-03-16 at 17:29 +0200, Mehmet Giritli wrote:
> On Wed, 2011-03-16 at 23:21 +0800, Ian Kent wrote:
> > On Wed, 2011-03-16 at 16:27 +0200, Mehmet Giritli wrote:
> > > Ian,
> > > 
> > > I am having much more frequent crashes now. I havent been able to
> > > cleanly reboot my machine yet and I have tried three times so far. Init
> > > scripts fail to unmount the file systems and I have to reboot manually
> > 
> > What do your autofs maps look like?
> > 
> > 
> 
> Here is  the contents of my auto.misc:
> 
> gollum-media            -rsize=8192,wsize=8192,soft,timeo=10,rw         gollum.giritli.eu:/mnt/media
> gollum-distfiles        -rsize=8192,wsize=8192,soft,timeo=10,rw         gollum.giritli.eu:/usr/portage/distfiles
> gollum-www              -rsize=8192,wsize=8192,soft,timeo=10,rw         gollum.giritli.eu:/var/www
> gollum-WebDav           -rsize=8192,wsize=8192,soft,timeo=10,rw         gollum.giritli.eu:/var/dav

What, that's it, and your only using "/misc    /etc/auto.misc" in the
master map and your having problems.

Are the crashes always the same?
How have you established that the BUG()s are in fact due to automount
umounting mounts and that the BUG()s correspond to NFS mounts previously
mounted by autofs?
Is there any noise at all in the syslog?
Are you sure your using a kernel with the dentry leak patch?
What sort of automounting load is happening on the machine, ie.
frequency or mounts and umounts and what timeout are you using?

The dentry leak patch got rid of the BUG()s I was seeing but by that
time I did have a couple of other patches. I still don't think the other
patches made much difference for this particular case.

> 
> > > 
> > > On Wed, 2011-03-16 at 10:32 +0800, Ian Kent wrote:
> > > > On Wed, 2011-03-16 at 01:54 +0200, Mehmet Giritli wrote:
> > > > > The missing piece is as follows:
> > > > > 
> > > > > Mar 15 22:37:38 mordor kernel: [ 1860.156114] BUG: Dentry
> > > > > ffff88023f96e600{i=25f56f,n=} still in use (1) [unmount of nfs 0:f]
> > > > 
> > > > This might be the same problem I saw and described in rc1.
> > > > However, for me the fs in the BUG() report was autofs.
> > > > Hopefully that just means my autofs setup is different.
> > > > 
> > > > At the moment I believe a dentry leak Al Viro spotted is the cause.
> > > > Please try this patch.
> > > > 
> > > > autofs4 - fix dentry leak in autofs4_expire_direct()
> > > > 
> > > > From: Ian Kent <raven@xxxxxxxxxx>
> > > > 
> > > > There is a missing dput() when returning from autofs4_expire_direct()
> > > > when we see that the dentry is already a pending mount.
> > > > 
> > > > Signed-off-by: Ian Kent <raven@xxxxxxxxxx>
> > > > ---
> > > > 
> > > >  fs/autofs4/expire.c |    7 +++----
> > > >  1 files changed, 3 insertions(+), 4 deletions(-)
> > > > 
> > > > 
> > > > diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
> > > > index c896dd6..c403abc 100644
> > > > --- a/fs/autofs4/expire.c
> > > > +++ b/fs/autofs4/expire.c
> > > > @@ -290,10 +290,8 @@ struct dentry *autofs4_expire_direct(struct super_block *sb,
> > > >  	spin_lock(&sbi->fs_lock);
> > > >  	ino = autofs4_dentry_ino(root);
> > > >  	/* No point expiring a pending mount */
> > > > -	if (ino->flags & AUTOFS_INF_PENDING) {
> > > > -		spin_unlock(&sbi->fs_lock);
> > > > -		return NULL;
> > > > -	}
> > > > +	if (ino->flags & AUTOFS_INF_PENDING)
> > > > +		goto out;
> > > >  	if (!autofs4_direct_busy(mnt, root, timeout, do_now)) {
> > > >  		struct autofs_info *ino = autofs4_dentry_ino(root);
> > > >  		ino->flags |= AUTOFS_INF_EXPIRING;
> > > > @@ -301,6 +299,7 @@ struct dentry *autofs4_expire_direct(struct super_block *sb,
> > > >  		spin_unlock(&sbi->fs_lock);
> > > >  		return root;
> > > >  	}
> > > > +out:
> > > >  	spin_unlock(&sbi->fs_lock);
> > > >  	dput(root);
> > > >  
> > > > 
> > > > > 
> > > > > (sorry for the inconvenience Andrew)
> > > > >  
> > > > > On Tue, 2011-03-15 at 14:24 -0700, Andrew Morton wrote:
> > > > > > (switched to email.  Please respond via emailed reply-to-all, not via the
> > > > > > bugzilla web interface).
> > > > > > 
> > > > > > Seems that we have a nasty involving autofs, nfs and the VFS.
> > > > > > 
> > > > > > Mehmet, the kernel should have printed some diagnostics prior to doing
> > > > > > the BUG() call:
> > > > > > 
> > > > > > 			if (dentry->d_count != 0) {
> > > > > > 				printk(KERN_ERR
> > > > > > 				       "BUG: Dentry %p{i=%lx,n=%s}"
> > > > > > 				       " still in use (%d)"
> > > > > > 				       " [unmount of %s %s]\n",
> > > > > > 				       dentry,
> > > > > > 				       dentry->d_inode ?
> > > > > > 				       dentry->d_inode->i_ino : 0UL,
> > > > > > 				       dentry->d_name.name,
> > > > > > 				       dentry->d_count,
> > > > > > 				       dentry->d_sb->s_type->name,
> > > > > > 				       dentry->d_sb->s_id);
> > > > > > 				BUG();
> > > > > > 			}
> > > > > > 
> > > > > > Please find those in the log and email them to use - someone might find
> > > > > > it useful.
> > > > > > 
> > > > > > 
> > > > > > On Tue, 15 Mar 2011 21:02:23 GMT
> > > > > > bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:
> > > > > > 
> > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=30882
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > --- Comment #4 from Mehmet Giritli <mehmet@xxxxxxxxxx>  2011-03-15 21:02:22 ---
> > > > > > > Here is that crash happening again, the system was NOT running overclocked or
> > > > > > > anything...
> > > > > > > 
> > > > > > > [ 1860.156122] ------------[ cut here ]------------
> > > > > > > [ 1860.156124] kernel BUG at fs/dcache.c:943!
> > > > > > > [ 1860.156126] invalid opcode: 0000 [#1] SMP 
> > > > > > > [ 1860.156127] last sysfs file: /sys/devices/platform/it87.552/fan3_input
> > > > > > > [ 1860.156128] CPU 3 
> > > > > > > [ 1860.156129] Modules linked in: iptable_mangle iptable_nat nf_nat ipt_LOG
> > > > > > > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac iptable_filter
> > > > > > > xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack ip_tables x_tables
> > > > > > > nvidia(P)
> > > > > > > [ 1860.156137] 
> > > > > > > [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P            2.6.38-rc8 #9
> > > > > > > Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5
> > > > > > > [ 1860.156142] RIP: 0010:[<ffffffff810e9648>]  [<ffffffff810e9648>]
> > > > > > > shrink_dcache_for_umount_subtree+0x268/0x270
> > > > > > > [ 1860.156147] RSP: 0018:ffff8800be82fe08  EFLAGS: 00010296
> > > > > > > [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX:
> > > > > > > 000000000003ffff
> > > > > > > [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI:
> > > > > > > ffffffff8174c9f8
> > > > > > > [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09:
> > > > > > > 0000000000000006
> > > > > > > [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > > > > ffff88023a07f5e0
> > > > > > > [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15:
> > > > > > > ffff880211d38740
> > > > > > > [ 1860.156155] FS:  00007f3428cb2700(0000) GS:ffff8800bfac0000(0000)
> > > > > > > knlGS:00000000f74186c0
> > > > > > > [ 1860.156156] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > > > > > [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4:
> > > > > > > 00000000000006e0
> > > > > > > [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > > > > 0000000000000000
> > > > > > > [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > > > > > > 0000000000000400
> > > > > > > [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo ffff8800be82e000, task
> > > > > > > ffff880211fd5640)
> > > > > > > [ 1860.156162] Stack:
> > > > > > > [ 1860.156163]  ffff88020c05ce50 0000000000000000 ffff88023fc07128
> > > > > > > ffff88020c05cc00
> > > > > > > [ 1860.156165]  ffff88023f96e6c0 ffff8800be82ff28 ffff88023f96e300
> > > > > > > ffffffff810e96a4
> > > > > > > [ 1860.156167]  ffff88023f49f480 ffff88020c05cc00 ffffffff8146d4a0
> > > > > > > ffffffff810d5d15
> > > > > > > [ 1860.156169] Call Trace:
> > > > > > > [ 1860.156172]  [<ffffffff810e96a4>] ? shrink_dcache_for_umount+0x54/0x60
> > > > > > > [ 1860.156174]  [<ffffffff810d5d15>] ? generic_shutdown_super+0x25/0x100
> > > > > > > [ 1860.156176]  [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40
> > > > > > > [ 1860.156179]  [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20
> > > > > > > [ 1860.156181]  [<ffffffff810d5f13>] ? deactivate_locked_super+0x43/0x70
> > > > > > > [ 1860.156183]  [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90
> > > > > > > [ 1860.156185]  [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0
> > > > > > > [ 1860.156187]  [<ffffffff8100243b>] ? system_call_fastpath+0x16/0x1b
> > > > > > > [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00 00 48 05 50
> > > > > > > 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 bc 35 00 <0f> 0b
> > > > > > > eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 
> > > > > > > [ 1860.156201] RIP  [<ffffffff810e9648>]
> > > > > > > shrink_dcache_for_umount_subtree+0x268/0x270
> > > > > > > [ 1860.156204]  RSP <ffff8800be82fe08>
> > > > > > > [ 1860.156205] ---[ end trace ee03486c16c108a7 ]---
> > > > > > > 
> > > > > > > -- 
> > > > > > > Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> > > > > > > ------- You are receiving this mail because: -------
> > > > > > > You are on the CC list for the bug.
> > > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html