On Mon, Jun 23, 2014 at 04:27:08PM +0900, Masayoshi Mizuma wrote: > Hi Dave, > > (I removed CCing xfs and linux-mm. And I changed your email address > to @redhat.com because this email includes RHEL7 kernel stack traces.) Please don't do that. There's nothing wrong with posting RHEL7 stack traces to public lists (though I'd prefer you to reproduce this problem on a 3.15 or 3.16-rc kernel), and breaking the thread of discussion makes it impossible to involve the people necessary to solve this problem. I've re-added xfs and linux-mm to the cc list, and taken my redhat address off it... <snip the 3 process back traces> [looks at sysrq-w output] kswapd0 is blocked in shrink_inactive_list/congestion_wait(). kswapd1 is blocked waiting for log space from shrink_inactive_list(). kthreadd is blocked in shrink_inactive_list/congestion_wait trying to fork another process. xfsaild is in uninterruptible sleep, indicating that there is still metadata to be written to push the log tail to it's required target, and it will retry again in less than 20ms. xfslogd is not blocked, indicating the log has not deadlocked due to lack of space. there are lots of timestamp updates waiting for log space. There is one kworker stuck in data IO completion on an inode lock. There are several threads blocked on an AGF lock trying to free extents. The bdi writeback thread is blocked waiting for allocation. A single xfs_alloc_wq kworker is blocked in shrink_inactive_list/congestion_wait while trying to read in btree blocks for transactional modification. Indicative of memory pressure trashing the working set of cached metadata. waiting for memory reclaim - holds agf lock, blocks unlinks There are 113 (!) blocked sadc processes - why are there so many stats gathering processes running? If you stop gathering stats, does the problem go away? There are 54 mktemp processes blocked - what is generating them? what filesystem are they actually running on? i.e. which XFS filesystem in the system is having log space shortages? And what is the xfs_info output of that filesystem i.e. have you simply oversubscribed a tiny log and so it crawls along at a very slow pace? All of the blocked processes are on CPUs 0-3 i.e. on node 0, which is handled by kswapd0, which is not blocked waiting for log space. Hmmm - what is the value of /proc/sys/vm/zone_reclaim_mode? If it is not zero, does setting it to zero make the problem go away? Interestingly enough, for a system under extreme memory pressure, don't see any processes blocked waiting for swap space or swap IO. Do you have any swap space configured on this machine? If you don't, does the problem go away when you add a swap device? Overall, I can't see anything that indicates that the filesystem has actually hung. I can see it having trouble allocating the memory it needs to make forwards progress, but the system itself is not deadlocked. Is there any IO being issued when the system is in this state? If there is Io being issued, then progress is being made and the system is merely slow because of the extreme memory pressure generated by the stress test. If there is not IO being issued, does the system start making progress again if you kill one of the memory hogs? i.e. does the equivalent of triggering an OOM-kill make the system responsive again? If it does, then the filesystem is not hung and the problem is that there isn't enough free memory to allow the filesystem to do IO and hence allow memory reclaim to make progress. In which case, does increasing /proc/sys/vm/min_free_kbytes make the problem go away? Cheers, Dave. -- Dave Chinner dchinner@xxxxxxxxxx -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>