Re: [Bug 30882] Automatic process group scheduling causes crashes after a while

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2011-03-16 at 23:44 +0800, Ian Kent wrote:
> On Wed, 2011-03-16 at 17:29 +0200, Mehmet Giritli wrote:
> > On Wed, 2011-03-16 at 23:21 +0800, Ian Kent wrote:
> > > On Wed, 2011-03-16 at 16:27 +0200, Mehmet Giritli wrote:
> > > > Ian,
> > > > 
> > > > I am having much more frequent crashes now. I havent been able to
> > > > cleanly reboot my machine yet and I have tried three times so far. Init
> > > > scripts fail to unmount the file systems and I have to reboot manually
> > > 
> > > What do your autofs maps look like?
> > > 
> > > 
> > 
> > Here is  the contents of my auto.misc:
> > 
> > gollum-media            -rsize=8192,wsize=8192,soft,timeo=10,rw         gollum.giritli.eu:/mnt/media
> > gollum-distfiles        -rsize=8192,wsize=8192,soft,timeo=10,rw         gollum.giritli.eu:/usr/portage/distfiles
> > gollum-www              -rsize=8192,wsize=8192,soft,timeo=10,rw         gollum.giritli.eu:/var/www
> > gollum-WebDav           -rsize=8192,wsize=8192,soft,timeo=10,rw         gollum.giritli.eu:/var/dav
> 
> What, that's it, and your only using "/misc    /etc/auto.misc" in the
> master map and your having problems.

yes

> 
> Are the crashes always the same?

identical

> How have you established that the BUG()s are in fact due to automount
> umounting mounts and that the BUG()s correspond to NFS mounts previously
> mounted by autofs?

I havent established anything. However, thats the only way I mount nfs
and my file manager hangs, init scripts hang when trying to unmount...

> Is there any noise at all in the syslog?

nothing unusual

> Are you sure your using a kernel with the dentry leak patch?

yes

> What sort of automounting load is happening on the machine, ie.
> frequency or mounts and umounts and what timeout are you using?

from auto.master:

/mnt/autofs     /etc/auto.misc  --timeout=300 --ghost

Not very much. Lets say 2-3 times every hour for each mount point.

> The dentry leak patch got rid of the BUG()s I was seeing but by that
> time I did have a couple of other patches. I still don't think the other
> patches made much difference for this particular case.
> 
> > 
> > > > 
> > > > On Wed, 2011-03-16 at 10:32 +0800, Ian Kent wrote:
> > > > > On Wed, 2011-03-16 at 01:54 +0200, Mehmet Giritli wrote:
> > > > > > The missing piece is as follows:
> > > > > > 
> > > > > > Mar 15 22:37:38 mordor kernel: [ 1860.156114] BUG: Dentry
> > > > > > ffff88023f96e600{i=25f56f,n=} still in use (1) [unmount of nfs 0:f]
> > > > > 
> > > > > This might be the same problem I saw and described in rc1.
> > > > > However, for me the fs in the BUG() report was autofs.
> > > > > Hopefully that just means my autofs setup is different.
> > > > > 
> > > > > At the moment I believe a dentry leak Al Viro spotted is the cause.
> > > > > Please try this patch.
> > > > > 
> > > > > autofs4 - fix dentry leak in autofs4_expire_direct()
> > > > > 
> > > > > From: Ian Kent <raven@xxxxxxxxxx>
> > > > > 
> > > > > There is a missing dput() when returning from autofs4_expire_direct()
> > > > > when we see that the dentry is already a pending mount.
> > > > > 
> > > > > Signed-off-by: Ian Kent <raven@xxxxxxxxxx>
> > > > > ---
> > > > > 
> > > > >  fs/autofs4/expire.c |    7 +++----
> > > > >  1 files changed, 3 insertions(+), 4 deletions(-)
> > > > > 
> > > > > 
> > > > > diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
> > > > > index c896dd6..c403abc 100644
> > > > > --- a/fs/autofs4/expire.c
> > > > > +++ b/fs/autofs4/expire.c
> > > > > @@ -290,10 +290,8 @@ struct dentry *autofs4_expire_direct(struct super_block *sb,
> > > > >  	spin_lock(&sbi->fs_lock);
> > > > >  	ino = autofs4_dentry_ino(root);
> > > > >  	/* No point expiring a pending mount */
> > > > > -	if (ino->flags & AUTOFS_INF_PENDING) {
> > > > > -		spin_unlock(&sbi->fs_lock);
> > > > > -		return NULL;
> > > > > -	}
> > > > > +	if (ino->flags & AUTOFS_INF_PENDING)
> > > > > +		goto out;
> > > > >  	if (!autofs4_direct_busy(mnt, root, timeout, do_now)) {
> > > > >  		struct autofs_info *ino = autofs4_dentry_ino(root);
> > > > >  		ino->flags |= AUTOFS_INF_EXPIRING;
> > > > > @@ -301,6 +299,7 @@ struct dentry *autofs4_expire_direct(struct super_block *sb,
> > > > >  		spin_unlock(&sbi->fs_lock);
> > > > >  		return root;
> > > > >  	}
> > > > > +out:
> > > > >  	spin_unlock(&sbi->fs_lock);
> > > > >  	dput(root);
> > > > >  
> > > > > 
> > > > > > 
> > > > > > (sorry for the inconvenience Andrew)
> > > > > >  
> > > > > > On Tue, 2011-03-15 at 14:24 -0700, Andrew Morton wrote:
> > > > > > > (switched to email.  Please respond via emailed reply-to-all, not via the
> > > > > > > bugzilla web interface).
> > > > > > > 
> > > > > > > Seems that we have a nasty involving autofs, nfs and the VFS.
> > > > > > > 
> > > > > > > Mehmet, the kernel should have printed some diagnostics prior to doing
> > > > > > > the BUG() call:
> > > > > > > 
> > > > > > > 			if (dentry->d_count != 0) {
> > > > > > > 				printk(KERN_ERR
> > > > > > > 				       "BUG: Dentry %p{i=%lx,n=%s}"
> > > > > > > 				       " still in use (%d)"
> > > > > > > 				       " [unmount of %s %s]\n",
> > > > > > > 				       dentry,
> > > > > > > 				       dentry->d_inode ?
> > > > > > > 				       dentry->d_inode->i_ino : 0UL,
> > > > > > > 				       dentry->d_name.name,
> > > > > > > 				       dentry->d_count,
> > > > > > > 				       dentry->d_sb->s_type->name,
> > > > > > > 				       dentry->d_sb->s_id);
> > > > > > > 				BUG();
> > > > > > > 			}
> > > > > > > 
> > > > > > > Please find those in the log and email them to use - someone might find
> > > > > > > it useful.
> > > > > > > 
> > > > > > > 
> > > > > > > On Tue, 15 Mar 2011 21:02:23 GMT
> > > > > > > bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:
> > > > > > > 
> > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=30882
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > --- Comment #4 from Mehmet Giritli <mehmet@xxxxxxxxxx>  2011-03-15 21:02:22 ---
> > > > > > > > Here is that crash happening again, the system was NOT running overclocked or
> > > > > > > > anything...
> > > > > > > > 
> > > > > > > > [ 1860.156122] ------------[ cut here ]------------
> > > > > > > > [ 1860.156124] kernel BUG at fs/dcache.c:943!
> > > > > > > > [ 1860.156126] invalid opcode: 0000 [#1] SMP 
> > > > > > > > [ 1860.156127] last sysfs file: /sys/devices/platform/it87.552/fan3_input
> > > > > > > > [ 1860.156128] CPU 3 
> > > > > > > > [ 1860.156129] Modules linked in: iptable_mangle iptable_nat nf_nat ipt_LOG
> > > > > > > > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac iptable_filter
> > > > > > > > xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack ip_tables x_tables
> > > > > > > > nvidia(P)
> > > > > > > > [ 1860.156137] 
> > > > > > > > [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P            2.6.38-rc8 #9
> > > > > > > > Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5
> > > > > > > > [ 1860.156142] RIP: 0010:[<ffffffff810e9648>]  [<ffffffff810e9648>]
> > > > > > > > shrink_dcache_for_umount_subtree+0x268/0x270
> > > > > > > > [ 1860.156147] RSP: 0018:ffff8800be82fe08  EFLAGS: 00010296
> > > > > > > > [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX:
> > > > > > > > 000000000003ffff
> > > > > > > > [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI:
> > > > > > > > ffffffff8174c9f8
> > > > > > > > [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09:
> > > > > > > > 0000000000000006
> > > > > > > > [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > > > > > ffff88023a07f5e0
> > > > > > > > [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15:
> > > > > > > > ffff880211d38740
> > > > > > > > [ 1860.156155] FS:  00007f3428cb2700(0000) GS:ffff8800bfac0000(0000)
> > > > > > > > knlGS:00000000f74186c0
> > > > > > > > [ 1860.156156] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > > > > > > [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4:
> > > > > > > > 00000000000006e0
> > > > > > > > [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > > > > > 0000000000000000
> > > > > > > > [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > > > > > > > 0000000000000400
> > > > > > > > [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo ffff8800be82e000, task
> > > > > > > > ffff880211fd5640)
> > > > > > > > [ 1860.156162] Stack:
> > > > > > > > [ 1860.156163]  ffff88020c05ce50 0000000000000000 ffff88023fc07128
> > > > > > > > ffff88020c05cc00
> > > > > > > > [ 1860.156165]  ffff88023f96e6c0 ffff8800be82ff28 ffff88023f96e300
> > > > > > > > ffffffff810e96a4
> > > > > > > > [ 1860.156167]  ffff88023f49f480 ffff88020c05cc00 ffffffff8146d4a0
> > > > > > > > ffffffff810d5d15
> > > > > > > > [ 1860.156169] Call Trace:
> > > > > > > > [ 1860.156172]  [<ffffffff810e96a4>] ? shrink_dcache_for_umount+0x54/0x60
> > > > > > > > [ 1860.156174]  [<ffffffff810d5d15>] ? generic_shutdown_super+0x25/0x100
> > > > > > > > [ 1860.156176]  [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40
> > > > > > > > [ 1860.156179]  [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20
> > > > > > > > [ 1860.156181]  [<ffffffff810d5f13>] ? deactivate_locked_super+0x43/0x70
> > > > > > > > [ 1860.156183]  [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90
> > > > > > > > [ 1860.156185]  [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0
> > > > > > > > [ 1860.156187]  [<ffffffff8100243b>] ? system_call_fastpath+0x16/0x1b
> > > > > > > > [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00 00 48 05 50
> > > > > > > > 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 bc 35 00 <0f> 0b
> > > > > > > > eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 
> > > > > > > > [ 1860.156201] RIP  [<ffffffff810e9648>]
> > > > > > > > shrink_dcache_for_umount_subtree+0x268/0x270
> > > > > > > > [ 1860.156204]  RSP <ffff8800be82fe08>
> > > > > > > > [ 1860.156205] ---[ end trace ee03486c16c108a7 ]---
> > > > > > > > 
> > > > > > > > -- 
> > > > > > > > Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> > > > > > > > ------- You are receiving this mail because: -------
> > > > > > > > You are on the CC list for the bug.
> > > > > > > 
> > > > > > 
> > > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux