I have bisected it and it seems introduced here: How could it be? 54a6eb5c4765aa573a030ceeba2c14e3d2ea5706 is first bad commit commit 54a6eb5c4765aa573a030ceeba2c14e3d2ea5706 Author: Mel Gorman <mel@xxxxxxxxx> Date: Mon Apr 28 02:12:16 2008 -0700 mm: use two zonelist that are filtered by GFP mask Currently a node has two sets of zonelists, one for each zone type in the system and a second set for GFP_THISNODE allocations. Based on the zones allowed by a gfp mask, one of these zonelists is selected. All of these zonelists consume memory and occupy cache lines. This patch replaces the multiple zonelists per-node with two zonelists. The first contains all populated zones in the system, ordered by distance, for fallback allocations when the target/preferred node has no free pages. The second contains all populated zones in the node suitable for GFP_THISNODE allocations. An iterator macro is introduced called for_each_zone_zonelist() that interates through each zone allowed by the GFP flags in the selected zonelist. Signed-off-by: Mel Gorman <mel@xxxxxxxxx> Acked-by: Christoph Lameter <clameter@xxxxxxx> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@xxxxxx> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> Cc: Mel Gorman <mel@xxxxxxxxx> Cc: Christoph Lameter <clameter@xxxxxxx> Cc: Hugh Dickins <hugh@xxxxxxxxxxx> Cc: Nick Piggin <nickpiggin@xxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> :040000 040000 89cdad93d855fa839537454113f2716011ca0e26 57aa307f4bddd264e70c759a2fb2076bfde363eb M arch :040000 040000 4add802178c0088a85d3738b42ec42ca33e07d60 126d3b170424a18b60074a7901c4e9b98f3bdee5 M fs :040000 040000 9d215d6248382dab53003d230643f0169f3e3e84 67d196d890a27d2211b3bf7e833e6366addba739 M include :040000 040000 6502d185e8ea6338953027c29cc3ab960d6f9bad c818e0fc538cdc40016e2d5fe33661c9c54dc8a5 M mm git-bisect start # bad: [28a4acb48586dc21d2d14a75a7aab7be78b7c83b] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 git-bisect bad 28a4acb48586dc21d2d14a75a7aab7be78b7c83b # good: [4b119e21d0c66c22e8ca03df05d9de623d0eb50f] Linux 2.6.25 git-bisect good 4b119e21d0c66c22e8ca03df05d9de623d0eb50f # good: [fdfc7452f17eb65eb29a143cf992ea2b8d262c7a] V4L/DVB (7626): Kconfig: VIDEO_AU0828 should select DVB_AU8522 and DVB_TUNER_XC5000 git-bisect good fdfc7452f17eb65eb29a143cf992ea2b8d262c7a # bad: [96fffeb4b413a4f8f65bb627d59b7dfc97ea0b39] make CC_OPTIMIZE_FOR_SIZE non-experimental git-bisect bad 96fffeb4b413a4f8f65bb627d59b7dfc97ea0b39 # good: [ce1d5b23a8d1e19866ab82bdec0dc41fde5273d8] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input git-bisect good ce1d5b23a8d1e19866ab82bdec0dc41fde5273d8 # good: [69a9f69bb24d6d3dbf3d2ba542ddceeda40536d5] KVM: Move some x86 specific constants and structures to include/asm-x86 git-bisect good 69a9f69bb24d6d3dbf3d2ba542ddceeda40536d5 # bad: [e26831814998cee8e6d9f0a9854cb46c516f5547] pageflags: use an enum for the flags git-bisect bad e26831814998cee8e6d9f0a9854cb46c516f5547 # good: [42cadc86008aae0fd9ff31642dc01ed50723cf32] Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm git-bisect good 42cadc86008aae0fd9ff31642dc01ed50723cf32 # good: [e5fc9cc0266e5babcf84c81908ec8843b7e3349f] rtc-pcf8563: new style conversion git-bisect good e5fc9cc0266e5babcf84c81908ec8843b7e3349f # bad: [797df5749032c2286bc7ff3a52de41fde0cdf0a5] mm: try both endianess when checking for endianess git-bisect bad 797df5749032c2286bc7ff3a52de41fde0cdf0a5 # good: [488514d1798289f56f80ed018e246179fe500383] Remove set_migrateflags() git-bisect good 488514d1798289f56f80ed018e246179fe500383 # good: [dac1d27bc8d5ca636d3014ecfdf94407031d1970] mm: use zonelists instead of zones when direct reclaiming pages git-bisect good dac1d27bc8d5ca636d3014ecfdf94407031d1970 # bad: [54a6eb5c4765aa573a030ceeba2c14e3d2ea5706] mm: use two zonelist that are filtered by GFP mask git-bisect bad 54a6eb5c4765aa573a030ceeba2c14e3d2ea5706 # good: [18ea7e710d2452fa726814a406779188028cf1bf] mm: remember what the preferred zone is for zone_statistics git-bisect good 18ea7e710d2452fa726814a406779188028cf1bf I remind the log message (it still happens on -rc5): Machine hangs for few seconds and that is all bad things, but even that should not happen. I can caught such thing during the first hour of running. [ INFO: possible circular locking dependency detected ] 2.6.26-rc5-00084-g39b945a #3 ------------------------------------------------------- nfsd/3457 is trying to acquire lock: (iprune_mutex){--..}, at: [<c016fb6c>] shrink_icache_memory+0x38/0x19b but task is already holding lock: (&(&ip->i_iolock)->mr_lock){----}, at: [<c021108f>] xfs_ilock+0xa2/0xd6 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(&ip->i_iolock)->mr_lock){----}: [<c0135416>] __lock_acquire+0xa0c/0xbc6 [<c013563a>] lock_acquire+0x6a/0x86 [<c012c4f2>] down_write_nested+0x33/0x6a [<c0211068>] xfs_ilock+0x7b/0xd6 [<c02111e1>] xfs_ireclaim+0x1d/0x59 [<c022f342>] xfs_finish_reclaim+0x173/0x195 [<c0231496>] xfs_reclaim+0xb3/0x138 [<c023ba0f>] xfs_fs_clear_inode+0x55/0x8e [<c016f830>] clear_inode+0x83/0xd2 [<c016faaf>] dispose_list+0x3c/0xc1 [<c016fca7>] shrink_icache_memory+0x173/0x19b [<c014a7fa>] shrink_slab+0xda/0x153 [<c014aa53>] try_to_free_pages+0x1e0/0x2a1 [<c0146ad7>] __alloc_pages_internal+0x23f/0x3a7 [<c0146c56>] __alloc_pages+0xa/0xc [<c015b8c2>] __slab_alloc+0x1c7/0x513 [<c015beef>] kmem_cache_alloc+0x45/0xb3 [<c01a5afe>] reiserfs_alloc_inode+0x12/0x23 [<c016f308>] alloc_inode+0x14/0x1a9 [<c016f5ed>] iget5_locked+0x47/0x133 [<c019dffd>] reiserfs_iget+0x29/0x7d [<c019b655>] reiserfs_lookup+0xb1/0xee [<c01657c2>] do_lookup+0xa9/0x146 [<c0166deb>] __link_path_walk+0x734/0xb2f [<c016722f>] path_walk+0x49/0x96 [<c01674e0>] do_path_lookup+0x12f/0x149 [<c0167d08>] __user_walk_fd+0x2f/0x48 [<c0162157>] vfs_lstat_fd+0x16/0x3d [<c01621e9>] vfs_lstat+0x11/0x13 [<c01621ff>] sys_lstat64+0x14/0x28 [<c0102bb9>] sysenter_past_esp+0x6a/0xb1 [<ffffffff>] 0xffffffff -> #0 (iprune_mutex){--..}: [<c0135333>] __lock_acquire+0x929/0xbc6 [<c013563a>] lock_acquire+0x6a/0x86 [<c037db3e>] mutex_lock_nested+0xba/0x232 [<c016fb6c>] shrink_icache_memory+0x38/0x19b [<c014a7fa>] shrink_slab+0xda/0x153 [<c014aa53>] try_to_free_pages+0x1e0/0x2a1 [<c0146ad7>] __alloc_pages_internal+0x23f/0x3a7 [<c0146c56>] __alloc_pages+0xa/0xc [<c01484f2>] __do_page_cache_readahead+0xaa/0x16a [<c01487ac>] ondemand_readahead+0x119/0x127 [<c014880c>] page_cache_async_readahead+0x52/0x5d [<c0179410>] generic_file_splice_read+0x290/0x4a8 [<c023a46a>] xfs_splice_read+0x4b/0x78 [<c0237c78>] xfs_file_splice_read+0x24/0x29 [<c0178712>] do_splice_to+0x45/0x63 [<c017899e>] splice_direct_to_actor+0xc3/0x190 [<c01ceddd>] nfsd_vfs_read+0x1ed/0x2d0 [<c01cf24c>] nfsd_read+0x82/0x99 [<c01d47b8>] nfsd3_proc_read+0xdf/0x12a [<c01cb907>] nfsd_dispatch+0xcf/0x19e [<c036356c>] svc_process+0x3b3/0x68b [<c01cbe35>] nfsd+0x168/0x26b [<c01037db>] kernel_thread_helper+0x7/0x10 [<ffffffff>] 0xffffffff other info that might help us debug this: 3 locks held by nfsd/3457: #0: (hash_sem){..--}, at: [<c01d1a34>] exp_readlock+0xd/0xf #1: (&(&ip->i_iolock)->mr_lock){----}, at: [<c021108f>] xfs_ilock+0xa2/0xd6 #2: (shrinker_rwsem){----}, at: [<c014a744>] shrink_slab+0x24/0x153 stack backtrace: Pid: 3457, comm: nfsd Not tainted 2.6.26-rc5-00084-g39b945a #3 [<c01335c8>] print_circular_bug_tail+0x5a/0x65 [<c0133ec9>] ? print_circular_bug_header+0xa8/0xb3 [<c0135333>] __lock_acquire+0x929/0xbc6 [<c013563a>] lock_acquire+0x6a/0x86 [<c016fb6c>] ? shrink_icache_memory+0x38/0x19b [<c037db3e>] mutex_lock_nested+0xba/0x232 [<c016fb6c>] ? shrink_icache_memory+0x38/0x19b [<c016fb6c>] ? shrink_icache_memory+0x38/0x19b [<c016fb6c>] shrink_icache_memory+0x38/0x19b [<c014a7fa>] shrink_slab+0xda/0x153 [<c014aa53>] try_to_free_pages+0x1e0/0x2a1 [<c0149993>] ? isolate_pages_global+0x0/0x3e [<c0146ad7>] __alloc_pages_internal+0x23f/0x3a7 [<c0146c56>] __alloc_pages+0xa/0xc [<c01484f2>] __do_page_cache_readahead+0xaa/0x16a [<c01487ac>] ondemand_readahead+0x119/0x127 [<c014880c>] page_cache_async_readahead+0x52/0x5d [<c0179410>] generic_file_splice_read+0x290/0x4a8 [<c037f425>] ? _spin_unlock+0x27/0x3c [<c025140d>] ? _atomic_dec_and_lock+0x25/0x30 [<c01355b4>] ? __lock_acquire+0xbaa/0xbc6 [<c01787d5>] ? spd_release_page+0x0/0xf [<c023a46a>] xfs_splice_read+0x4b/0x78 [<c0237c78>] xfs_file_splice_read+0x24/0x29 [<c0178712>] do_splice_to+0x45/0x63 [<c017899e>] splice_direct_to_actor+0xc3/0x190 [<c01ceec0>] ? nfsd_direct_splice_actor+0x0/0xf [<c01ceddd>] nfsd_vfs_read+0x1ed/0x2d0 [<c01cf24c>] nfsd_read+0x82/0x99 [<c01d47b8>] nfsd3_proc_read+0xdf/0x12a [<c01cb907>] nfsd_dispatch+0xcf/0x19e [<c036356c>] svc_process+0x3b3/0x68b [<c01cbe35>] nfsd+0x168/0x26b [<c01cbccd>] ? nfsd+0x0/0x26b [<c01037db>] kernel_thread_helper+0x7/0x10 ======================= 2008/5/16 David Chinner <dgc@xxxxxxx>: > On Thu, May 15, 2008 at 09:45:55PM +0400, Alexander Beregalov wrote: >> 2008/5/12 David Chinner <dgc@xxxxxxx>: >> > On Sun, May 11, 2008 at 09:18:07AM +0530, Kamalesh Babulal wrote: >> >> Kamalesh Babulal wrote: >> >> > Adding the cc to kernel-list, Ingo Molnar and Peter Zijlstra >> >> > >> >> > Alexander Beregalov wrote: >> >> >> [ INFO: possible circular locking dependency detected ] >> >> >> 2.6.26-rc1-00279-g28a4acb #13 >> >> >> ------------------------------------------------------- >> >> >> nfsd/3087 is trying to acquire lock: >> >> >> (iprune_mutex){--..}, at: [<c016f947>] shrink_icache_memory+0x38/0x19b >> >> >> >> >> >> but task is already holding lock: >> >> >> (&(&ip->i_iolock)->mr_lock){----}, at: [<c0210b83>] xfs_ilock+0xa2/0xd6 > > [snip] > >> > Oh, yeah, that. Direct inode reclaim through memory pressure. >> > >> > Effectively memory reclaim inverts locking order w.r.t. iprune_mutex >> > when it recurses into the filesystem. False positive - can never >> > cause a deadlock on XFS. Can't be solved from the XFS side of things >> > without effectively turning off lockdep checking for xfs inode >> > locking. >> Yes, it is not a deadlock, but machine hangs for few seconds. >> It still happens about once a day for me. Every kernel report looks >> similar to the above. > > That hang is just memory reclaim running, I think you'll find. > It can take some time for reclaim to find pages to use, and meanwhile > everything in the machine will back up behind it.... > > Cheers, > > Dave. > -- > Dave Chinner > Principal Engineer > SGI Australian Software Group > -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html