{In future can you make sure you don't line wrap stack traces? they turn into an utter mess when being quoted if you wrap them} On Fri, Apr 15, 2011 at 09:23:50AM +0200, Yann Dupont wrote: > Le 07/04/2011 08:19, Dave Chinner a Ãcrit : > >This series fixes an OOM problem where VFS-only dirty inodes > >accumulate on an XFS filesystem due to atime updates causing OOM to > >occur. > > > >The first patch fixes a deadlock triggering bdi-flusher writeback > >from memory reclaim when a new bdi-flusher thread needs to be forked > >and no memory is available. > > > >the second adds a bdi-flusher kick from XFS's inode cache shrinker > >so that when memory is low the VFS starts writing back dirty inodes > >so they can be reclaimed as they get cleaned rather than remaining > >dirty and pinning the inode cache in memory. > > > Hello, we've been hit for some times by a bug (oom) which may been > related to this one. Our server contains lots of samba server (in > linux-vserver, this is NOT a vanilla kernel) and is also NFS kernel > server. > The oom generally happens after 1 month of uptime, and last week we > also had the problem after 1 week. > > for example this one : .... > [2743777.877340] Pid: 10121, comm: admind Not tainted 2.6.32-5-vserver-amd64 #1 Vserver. uggh. Call Trace: <IRQ> [<ffffffff810c3f43>] ? > __alloc_pages_nodemask+0x592/0x5f3 [<ffffffff810f0d1e>] ? new_slab+0x5b/0x1ca [<ffffffff810f107d>] ? __slab_alloc+0x1f0/0x39b [<ffffffff812565c8>] ? __netdev_alloc_skb+0x29/0x45 [<ffffffff810f1aaf>] ? __kmalloc_node_track_caller+0xbb/0x11b [<ffffffff812565c8>] ? __netdev_alloc_skb+0x29/0x45 [<ffffffff812555f5>] ? __alloc_skb+0x69/0x15a [<ffffffff812565c8>] ? __netdev_alloc_skb+0x29/0x45 [<ffffffffa00af52a>] ? bnx2_alloc_rx_skb+0x4c/0x1a3 [bnx2] [<ffffffffa00b34fb>] ? bnx2_poll_work+0x4f3/0xa7e [bnx2] [<ffffffffa00b3c47>] ? bnx2_poll+0x11b/0x229 [bnx2] [<ffffffff8125c851>] ? net_rx_action+0xae/0x1c9 [<ffffffff8105430b>] ? __do_softirq+0xdd/0x1a2 [<ffffffff81011cac>] ? call_softirq+0x1c/0x30 [<ffffffff8101322b>] ? do_softirq+0x3f/0x7c [<ffffffff8105417a>] ? irq_exit+0x36/0x76 [<ffffffff81012922>] ? do_IRQ+0xa0/0xb6 [<ffffffff810114d3>] ? ret_from_intr+0x0/0x11 <EOI> [<ffffffffa02304cf>] ? xfs_reclaim_inode+0x0/0xe0 [xfs] [<ffffffff8130a7c5>] ? _write_lock+0x7/0xf [<ffffffffa0230e3d>] ? xfs_inode_ag_walk+0x4e/0xef [xfs] [<ffffffffa02304cf>] ? xfs_reclaim_inode+0x0/0xe0 [xfs] [<ffffffffa0230f4f>] ? xfs_inode_ag_iterator+0x71/0xb2 [xfs] [<ffffffffa02304cf>] ? xfs_reclaim_inode+0x0/0xe0 [xfs] [<ffffffffa0230feb>] ? xfs_reclaim_inode_shrink+0x5b/0x10d [xfs] [<ffffffff810c8dd1>] ? shrink_slab+0xe0/0x153 [<ffffffff810c9d2e>] ? try_to_free_pages+0x26a/0x38e [<ffffffff810c6ceb>] ? isolate_pages_global+0x0/0x20f [<ffffffff810c3d7e>] ? __alloc_pages_nodemask+0x3cd/0x5f3 [<ffffffff810f0d05>] ? new_slab+0x42/0x1ca [<ffffffff810f107d>] ? __slab_alloc+0x1f0/0x39b [<ffffffff8110437f>] ? getname+0x23/0x1a0 [<ffffffff8110437f>] ? getname+0x23/0x1a0 [<ffffffff810f1558>] ? kmem_cache_alloc+0x7f/0xf0 [<ffffffff8110437f>] ? getname+0x23/0x1a0 [<ffffffff810f75b3>] ? do_sys_open+0x1d/0xfc [<ffffffff81037623>] ? ia32_sysret+0x0/0x5 This, I'd say, has nothing to do with XFS - the system has taken a network interrupt and failed an allocation in bnx2 NIC driver. You chopped off the line that describes the actual allocation parameters that failed, so I can't really say why it failed... > Some questions : > > -What kernel versions are known to be impacted ? No idea. it was reporte don a .38-rc kernel, and I don't have the bandwiÑth to do a "which versions does it affect" search. > -What is the plan for inclusion in kernel ? Is this considered > appropriate material for 2.6.38.4 and older stable kernels ? None right now - the patch is dead in the water right now because of lock inversion issues it causes. Even so, I doubt I'd be back porting it to any stable kernel without having anyone report that it is the root cause of their OOM problems. > - Is mounting with noatime can alleviate the problem ? The problem that the patch I posted were supposed to fix, yes. The problem you are reporting here, most likely not. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs