Re: rm Tainted warning after kernel update.

Lukáš Czerner <lczerner@xxxxxxxxxx> · Wed, 16 Sep 2015 14:21:57 +0200 (CEST)

On Wed, 16 Sep 2015, Brian Foster wrote:

> Date: Wed, 16 Sep 2015 07:42:31 -0400
> From: Brian Foster <bfoster@xxxxxxxxxx>
> To: Lukáš Czerner <lczerner@xxxxxxxxxx>
> Cc: Grant Keller <grant.keller@xxxxxxxxx>, xfs@xxxxxxxxxxx
> Subject: Re: rm Tainted warning after kernel update.
> 
> On Wed, Sep 16, 2015 at 01:13:20PM +0200, Lukáš Czerner wrote:
> > On Wed, 16 Sep 2015, Brian Foster wrote:
> > 
> > > Date: Wed, 16 Sep 2015 06:50:23 -0400
> > > From: Brian Foster <bfoster@xxxxxxxxxx>
> > > To: Grant Keller <grant.keller@xxxxxxxxx>
> > > Cc: lczerner@xxxxxxxxxx, xfs@xxxxxxxxxxx
> > > Subject: Re: rm Tainted warning after kernel update.
> > > 
> > > cc Lukas
> > > 
> > > On Tue, Sep 15, 2015 at 03:32:22PM -0700, Grant Keller wrote:
> > > > On 09/15/2015 04:00 AM, Brian Foster wrote:
> > > > > On Mon, Sep 14, 2015 at 11:57:37AM -0700, Grant Keller wrote:
> > > > >> Hello,
> > > > >>
> > > > >> I have a server running Scientific Linux 6.7, and since updating to
> > > > >> kernel 2.6.32-573.3.1.el6.x86_64 the following error has begun appearing
> > > > >> in our message logs:
> > > > >>
> > > > >> Sep 14 11:43:03 localhost kernel: ------------[ cut here ]------------
> > > > >> Sep 14 11:43:03 localhost kernel: WARNING: at fs/dcache.c:758
> > > > >> d_delete+0x260/0x2c0() (Tainted: G        W  -- ------------   )
> > > > >> Sep 14 11:43:03 localhost kernel: Hardware name: X7DB8
> > > > >> Sep 14 11:43:03 localhost kernel: Modules linked in: nfsd nfs_acl
> > > > >> auth_rpcgss autofs4 lockd sunrpc p4_clockmod freq_table speedstep_lib
> > > > >> nf_conntrack_ftp iptable_mangle xt_comment nf_conntrack_ipv4
> > > > >> nf_defrag_ipv4 xt_multiport iptable_filter ip_tables ip6t_REJECT
> > > > >> nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
> > > > >> ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm
> > > > >> iw_cm ib_sa ib_mad ib_core ib_addr ipv6 xfs exportfs ppdev parport_pc
> > > > >> parport sg e1000e microcode serio_raw iTCO_wdt iTCO_vendor_support ixgbe
> > > > >> ptp pps_core mdio i2c_i801 lpc_ich mfd_core i5000_edac edac_core i5k_amb
> > > > >> ioatdma dca shpchp ext4 jbd2 mbcache sd_mod crc_t10dif 3w_9xxx pata_acpi
> > > > >> ata_generic ata_piix radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core
> > > > >> dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipmi_msghandler]
> > > > >> Sep 14 11:43:03 localhost kernel: Pid: 15893, comm: rm Tainted: G       
> > > > >> W  -- ------------    2.6.32-573.3.1.el6.x86_64 #1
> > > > >> Sep 14 11:43:03 localhost kernel: Call Trace:
> > > > >> Sep 14 11:43:03 localhost kernel: [<ffffffff81077491>] ?
> > > > >> warn_slowpath_common+0x91/0xe0
> > > > >> Sep 14 11:43:03 localhost kernel: [<ffffffff810774fa>] ?
> > > > >> warn_slowpath_null+0x1a/0x20
> > > > >> Sep 14 11:43:03 localhost kernel: [<ffffffff811ae660>] ?
> > > > >> d_delete+0x260/0x2c0
> > > > >> Sep 14 11:43:03 localhost kernel: [<ffffffff811a0908>] ? vfs_rmdir+0xe8/0xf0
> > > > >> Sep 14 11:43:03 localhost kernel: [<ffffffff811a3b64>] ?
> > > > >> do_rmdir+0x184/0x1f0
> > > > >> Sep 14 11:43:03 localhost kernel: [<ffffffff81193511>] ? __fput+0x1a1/0x210
> > > > >> Sep 14 11:43:03 localhost kernel: [<ffffffff810e8ab7>] ?
> > > > >> audit_syscall_entry+0x1d7/0x200
> > > > >> Sep 14 11:43:03 localhost kernel: [<ffffffff811a3bfd>] ?
> > > > >> sys_unlinkat+0x2d/0x40
> > > > >> Sep 14 11:43:03 localhost kernel: [<ffffffff8100b0d2>] ?
> > > > >> system_call_fastpath+0x16/0x1b
> > > > >> Sep 14 11:43:03 localhost kernel: ---[ end trace 6080ec4a7ec5ec25 ]---
> > > > >>
> > > > >> This happens when we are expiring older backups from the archives, so I
> > > > >> have quite a few of these. We have xfsprogs 3.1.1-16.el6.x86_64
> > > > >> installed. Looking for advice on how to proceed.
> > > > >>
> > > > > This looks like something funky going on in the vfs. The warning is from
> > > > > unhash_offsprings() and it appears to be complaining about a refcount on
> > > > > a dentry that is a child of a directory being removed. It checks a
> > > > > refcount on a dentry in one loop and either drops it or moves it to
> > > > > another list for apparent deletion. The second iteration of the
> > > > > aforementioned list sees a refcount on an object that wasn't there
> > > > > before.
> > > > >
> > > > > I suspect this means something is going from 0->1 unexpectedly, but I'm
> > > > > not familiar enough with that code to grok why that shouldn't happen and
> > > > > how it could without reproducing it and digging into it from there. Have
> > > > > you identified an explicit reproducer? I assume files are simply being
> > > > > removed with 'rm -rf' here..? If so, does anything else have access to
> > > > > this directory structure (e.g., separate commands, a running backup
> > > > > application?) at the the time of removal.
> > > > There could be something else running, but I would have to investigate
> > > > the next time this happens. The rm -rf is called by our backup program
> > > > expiring older backups from the filesystem.  The thing is, the
> > > > expirations happen on a nightly basis, but we don't always see these
> > > > warnings in the logs. On the nights we do, there are 1000+ warnings.
> > > > >
> > > > > Also, what kernel were you running before this started to occur?
> > > >  2.6.32-573.el6.x86_64 was the previous kernel.
> > > 
> > > Interesting... there are only a few fs changes between this kernel and
> > > the current. One of them is this:
> > > 
> > >   959c503 [fs] vfs: Unhash and evict unused children dentries after rmdir
> > > 
> > > ... which actually introduces the unhash_offsprings() thing. I've cc'd
> > > Lukas who is probably more familiar with this code.
> > > 
> > > FWIW, I suspect the more you can elaborate on what the backup
> > > application might be doing here (beyond just the rm -rf), the more
> > > likely this can be reproduced and resolved.
> > 
> > Hi the lockdep warning should be fixed in the recent rhel6 kernel.
> > Sorry about that.
> > 
> 
> My understanding is that this patch has been reverted...
> 
> Grant,
> 
> If a newer kernel is not yet available, you might want to revert to the
> previous version until one is and then go straight to that.
> 
> Lukas,
> 
> If the patch is indeed reverted, then clearly this problem will go away.
> That aside, where does lockdep come into play? Note the warning reported
> above:
> 
> 	WARNING: at fs/dcache.c:758 d_delete+0x260/0x2c0()
> 
> ... is an explicit WARN_ON() in the code added by this commit. Are we
> referring to the same issue here?
> 
> Brian

Hi Brian,

seems like we're not referring to the same issue as I though we're
talking about the lockdep problem, but apparently we're not.
Can we get a RH bugzilla for this issue so it can be dealt with
properly ?

Thanks!
-Lukas
_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs