pgd_lock must be taken with irqs enabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 28, 2011 at 01:44:01AM -0500, CAI Qian wrote:
> > > INFO: task pgrep:6039 blocked for more than 120 seconds.
> > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> > > message.
> > > pgrep D ffff887f606f1ab0 0 6039 6038 0x00000080
> > >  ffff8821e39c1ce0 0000000000000082 0000000000000246 0000000000000000
> > >  0000000000014d40 ffff887f606f1520 ffff887f606f1ab0 ffff8821e39c1fd8
> > >  ffff887f606f1ab8 0000000000014d40 ffff8821e39c0010 0000000000014d40
> > > Call Trace:
> > >  [<ffffffff814afeb5>] rwsem_down_failed_common+0xb5/0x140
> > >  [<ffffffff814aff75>] rwsem_down_read_failed+0x15/0x17
> > >  [<ffffffff81230174>] call_rwsem_down_read_failed+0x14/0x30
> > >  [<ffffffff814af504>] ? down_read+0x24/0x30
> > >  [<ffffffff8111f4dc>] access_process_vm+0x4c/0x200
> > >  [<ffffffff8113f3fe>] ? fallback_alloc+0x14e/0x270
> > >  [<ffffffff811afa4d>] proc_pid_cmdline+0x6d/0x120
> > >  [<ffffffff81137eba>] ? alloc_pages_current+0x9a/0x100
> > >  [<ffffffff811b037d>] proc_info_read+0xad/0xf0
> > >  [<ffffffff81154315>] vfs_read+0xc5/0x190
> > >  [<ffffffff811544e1>] sys_read+0x51/0x90
> > >  [<ffffffff8100bf82>] system_call_fastpath+0x16/0x1b
> > 
> > pgrep hung too, it's not just khugepaged hanging and it's not obvious
> > for now that khugepaged was guilty of forgetting an unlock, could be
> > the process deadlocked somewhere with the mmap_sem hold. Can you press
> > SYSRQ+T? Hopefully that will show the holder. Also is CONFIG_NUMA=y/n?
> Unfortunately, SYSRQ+T was not working. CONFIG_NUMA=y and this is an
> NUMA system as well.

I reviewed it again but it's unlikely the holder of the mmap_sem was
khugepaged. Something hung on the mmap_sem and pgrep and khugepaged
got blocked on it.

I'm however aware of a deadlock in pgd_lock, no idea if it's what
you're hitting but it worth fixing that one now!

x86 takes the pgd_lock by clearing irqs, and then it takes the
page_table_lock with irqs already off. It's always forbidden to keep
irqs off while taking the page_table_lock, because all IPIs are sent
for the tlb flushes with the page_table_lock held if PT locks are
disabled (NR_CPUS small) or if THP is on.  It's not THP bug, it's core
bug in pgd_lock that will trigger with PT locks disabled too without
THP: all those spin_lock_irqsave must become spin_lock. Either that or
the page_table_lock must not be taken with irqs off.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]