On Sat, Oct 02, 2010 at 08:10:02PM -0300, Carlos Carvalho wrote: > We have serious problems with 34.6 in a machine with ~11TiB xfs, with > a lot of simultaneous IO, particularly hundreds of rm and a sync > afterwards. Maybe they're related to these issues. > > The machine is a file server (almost all via http/apache) and has > several thousand connections all the time. It behaves quite well for > at most 4 days; from then on kswapd's start appearing on the display > of top consuming ever increasing percentages of cpu. This is no > problem, the machine has 16 nearly idle cores. However, after about > 5-7 days there's an abrupt transition: in about 30s the load goes to > several thousand, apache shows up consuming all possible cpu and > downloads nearly stop. I have to reboot the machine to get service > back. It manages to unmount the filesystems and reboot properly. > > Stopping/restarting apache restores the situation but only for > a short while; after about 2-3h the problem reappears. That's why I > have to reboot. > > With 35.6 the behaviour seems to have changed: now often > CONFIG_DETECT_HUNG_TASK produces this kind of call trace in the log: > > [<ffffffff81098578>] ? igrab+0x10/0x30 > [<ffffffff811160fe>] ? xfs_sync_inode_valid+0x4c/0x76 > [<ffffffff81116241>] ? xfs_sync_inode_data+0x1b/0xa8 > [<ffffffff811163e0>] ? xfs_inode_ag_walk+0x96/0xe4 > [<ffffffff811163dd>] ? xfs_inode_ag_walk+0x93/0xe4 > [<ffffffff81116226>] ? xfs_sync_inode_data+0x0/0xa8 > [<ffffffff81116495>] ? xfs_inode_ag_iterator+0x67/0xc4 > [<ffffffff81116226>] ? xfs_sync_inode_data+0x0/0xa8 > [<ffffffff810a48dd>] ? sync_one_sb+0x0/0x1e > [<ffffffff81116712>] ? xfs_sync_data+0x22/0x42 > [<ffffffff810a48dd>] ? sync_one_sb+0x0/0x1e > [<ffffffff8111678b>] ? xfs_quiesce_data+0x2b/0x94 > [<ffffffff81113f03>] ? xfs_fs_sync_fs+0x2d/0xd7 > [<ffffffff810a48dd>] ? sync_one_sb+0x0/0x1e > [<ffffffff810a48c4>] ? __sync_filesystem+0x62/0x7b > [<ffffffff8108993e>] ? iterate_supers+0x60/0x9d > [<ffffffff810a493a>] ? sys_sync+0x3f/0x53 > [<ffffffff81001dab>] ? system_call_fastpath+0x16/0x1b > > It doesn't seem to cause service disruption (at least the flux graphs > don't show drops). I didn't see it happen while I was watching so it > may be that service degrades for short intervals. Uptime with 35.6 is > only 3d8h so it's still not sure that the breakdown of 34.6 is gone > but kswapd's cpu usages are very small, less than with 34.6 for a > similar uptime. There are only 2 filesystems, and the big one has 256 > AGs. They're not mounted with delaylog. Apply this: http://www.oss.sgi.com/archives/xfs/2010-10/msg00000.html And in future, can you please report bugs in a new thread to the appropriate lists (xfs@xxxxxxxxxxx), not as a reply to a completely unrelated development thread.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html