Re: hugepage related lockdep trace.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/18/2013 06:13 PM, Minchan Kim wrote:
On Thu, Jul 18, 2013 at 11:12:24PM +0530, Aneesh Kumar K.V wrote:
Minchan Kim <minchan@xxxxxxxxxx> writes:

Ccing people get_maintainer says.

On Wed, Jul 17, 2013 at 11:32:23AM -0400, Dave Jones wrote:
[128095.470960] =================================
[128095.471315] [ INFO: inconsistent lock state ]
[128095.471660] 3.11.0-rc1+ #9 Not tainted
[128095.472156] ---------------------------------
[128095.472905] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[128095.473650] kswapd0/49 [HC0[0]:SC0[0]:HE1:SE1] takes:
[128095.474373]  (&mapping->i_mmap_mutex){+.+.?.}, at: [<c114971b>] page_referenced+0x87/0x5e3
[128095.475128] {RECLAIM_FS-ON-W} state was registered at:
[128095.475866]   [<c10a6232>] mark_held_locks+0x81/0xe7
[128095.476597]   [<c10a8db3>] lockdep_trace_alloc+0x5e/0xbc
[128095.477322]   [<c112316b>] __alloc_pages_nodemask+0x8b/0x9b6
[128095.478049]   [<c1123ab6>] __get_free_pages+0x20/0x31
[128095.478769]   [<c1123ad9>] get_zeroed_page+0x12/0x14
[128095.479477]   [<c113fe1e>] __pmd_alloc+0x1c/0x6b
[128095.480138]   [<c1155ea7>] huge_pmd_share+0x265/0x283
[128095.480138]   [<c1155f22>] huge_pte_alloc+0x5d/0x71
[128095.480138]   [<c115612e>] hugetlb_fault+0x7c/0x64a
[128095.480138]   [<c114087c>] handle_mm_fault+0x255/0x299
[128095.480138]   [<c15bbab0>] __do_page_fault+0x142/0x55c
[128095.480138]   [<c15bbed7>] do_page_fault+0xd/0x16
[128095.480138]   [<c15b927c>] error_code+0x6c/0x74
[128095.480138] irq event stamp: 3136917
[128095.480138] hardirqs last  enabled at (3136917): [<c15b8139>] _raw_spin_unlock_irq+0x27/0x50
[128095.480138] hardirqs last disabled at (3136916): [<c15b7f4e>] _raw_spin_lock_irq+0x15/0x78
[128095.480138] softirqs last  enabled at (3136180): [<c1048e4a>] __do_softirq+0x137/0x30f
[128095.480138] softirqs last disabled at (3136175): [<c1049195>] irq_exit+0xa8/0xaa
[128095.480138]
other info that might help us debug this:
[128095.480138]  Possible unsafe locking scenario:

[128095.480138]        CPU0
[128095.480138]        ----
[128095.480138]   lock(&mapping->i_mmap_mutex);
[128095.480138]   <Interrupt>
[128095.480138]     lock(&mapping->i_mmap_mutex);
[128095.480138]
  *** DEADLOCK ***

[128095.480138] no locks held by kswapd0/49.
[128095.480138]
stack backtrace:
[128095.480138] CPU: 1 PID: 49 Comm: kswapd0 Not tainted 3.11.0-rc1+ #9
[128095.480138] Hardware name: Dell Inc.                 Precision WorkStation 490    /0DT031, BIOS A08 04/25/2008
[128095.480138]  c1d32630 00000000 ee39fb18 c15b001e ee395780 ee39fb54 c15acdcb c1751845
[128095.480138]  c1751bbf 00000031 00000000 00000000 00000000 00000000 00000001 00000001
[128095.480138]  c1751bbf 00000008 ee395c44 00000100 ee39fb88 c10a6130 00000008 0000d8fb
[128095.480138] Call Trace:
[128095.480138]  [<c15b001e>] dump_stack+0x4b/0x79
[128095.480138]  [<c15acdcb>] print_usage_bug+0x1d9/0x1e3
[128095.480138]  [<c10a6130>] mark_lock+0x1e0/0x261
[128095.480138]  [<c10a5878>] ? check_usage_backwards+0x109/0x109
[128095.480138]  [<c10a6cde>] __lock_acquire+0x623/0x17f2
[128095.480138]  [<c107aa43>] ? sched_clock_cpu+0xcd/0x130
[128095.480138]  [<c107a7e8>] ? sched_clock_local+0x42/0x12e
[128095.480138]  [<c10a84cf>] lock_acquire+0x7d/0x195
[128095.480138]  [<c114971b>] ? page_referenced+0x87/0x5e3
[128095.480138]  [<c15b3671>] mutex_lock_nested+0x6c/0x3a7
[128095.480138]  [<c114971b>] ? page_referenced+0x87/0x5e3
[128095.480138]  [<c114971b>] ? page_referenced+0x87/0x5e3
[128095.480138]  [<c11661d5>] ? mem_cgroup_charge_statistics.isra.24+0x61/0x9e
[128095.480138]  [<c114971b>] page_referenced+0x87/0x5e3
[128095.480138]  [<f8433030>] ? raid0_congested+0x26/0x8a [raid0]
[128095.480138]  [<c112b9c7>] shrink_page_list+0x3d9/0x947
[128095.480138]  [<c10a6457>] ? trace_hardirqs_on+0xb/0xd
[128095.480138]  [<c112c3cf>] shrink_inactive_list+0x155/0x4cb
[128095.480138]  [<c112cd07>] shrink_lruvec+0x300/0x5ce
[128095.480138]  [<c112d028>] shrink_zone+0x53/0x14e
[128095.480138]  [<c112e531>] kswapd+0x517/0xa75
[128095.480138]  [<c112e01a>] ? mem_cgroup_shrink_node_zone+0x280/0x280
[128095.480138]  [<c10661ff>] kthread+0xa8/0xaa
[128095.480138]  [<c10a6457>] ? trace_hardirqs_on+0xb/0xd
[128095.480138]  [<c15bf737>] ret_from_kernel_thread+0x1b/0x28
[128095.480138]  [<c1066157>] ? insert_kthread_work+0x63/0x63
IMHO, it's a false positive because i_mmap_mutex was held by kswapd
while one in the middle of fault path could be never on kswapd context.

It seems lockdep for reclaim-over-fs isn't enough smart to identify
between background and direct reclaim.

Wait for other's opinion.
Is that reasoning correct ?. We may not deadlock because hugetlb pages
cannot be reclaimed. So the fault path in hugetlb won't end up
reclaiming pages from same inode. But the report is correct right ?


Looking at the hugetlb code we have in huge_pmd_share

out:
	pte = (pte_t *)pmd_alloc(mm, pud, addr);
	mutex_unlock(&mapping->i_mmap_mutex);
	return pte;

I guess we should move that pmd_alloc outside i_mmap_mutex. Otherwise
that pmd_alloc can result in a reclaim which can call shrink_page_list ?
True. Sorry for that I didn't review the code carefully and I was very paranoid
in reclaim-over-fs due to internal works. :(

Could you explain more about reclaim-over-fs stuff?


Something like  ?

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 83aff0a..2cb1be3 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3266,8 +3266,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
  		put_page(virt_to_page(spte));
  	spin_unlock(&mm->page_table_lock);
  out:
-	pte = (pte_t *)pmd_alloc(mm, pud, addr);
  	mutex_unlock(&mapping->i_mmap_mutex);
+	pte = (pte_t *)pmd_alloc(mm, pud, addr);
  	return pte;
I am blind on hugetlb but not sure it doesn't break eb48c071.
Michal?


  }
-aneesh



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]