Re: ksmd hung tasks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Oooh, that is unfair, I put that lru_add_drain_all() into ksmd just
> for you, and now you accuse it of hanging :)
Sorry, I hate to hear that this is just to fix for me which means my
tests too narrow to benefit others. :)

> More seriously, it looks like the draining never gets a chance to run
> on one (or more) of the cpus, presumably something else is keeping
> that cpu unnaturally busy.
> 
> I think this is either another manifestation of the same problem as in
> your "kswapd hung tasks" posting, or an example of kswapd itself too
> busy, as reported by David: see his patch to linux-mm today,
> 
> http://marc.info/?l=linux-mm&m=129582353117092&w=2
> 
> Or perhaps those two cases are themselves related, though kswapd
> appears to be too busy in one case, and prevented from getting busy in
> the other.
It is still hung tasks somewhere else after applied the patch.

INFO: task auditd:3278 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
auditd          D ffff88085af88fe0     0  3278      1 0x00000000
 ffff88085f0b1b08 0000000000000082 ffff88085f0b1ab8 ffffffff00000000
 0000000000014d40 ffff88085af88a80 ffff88085af88fe0 ffff88085f0b1fd8
 ffff88085af88fe8 0000000000014d40 ffff88085f0b0010 0000000000014d40
Call Trace:
 [<ffffffff810f5260>] ? sync_page+0x0/0x50
 [<ffffffff814a2330>] io_schedule+0x70/0xc0
 [<ffffffff810f52a0>] sync_page+0x40/0x50
 [<ffffffff814a2bef>] __wait_on_bit+0x5f/0x90
 [<ffffffff8107fd66>] ? autoremove_wake_function+0x16/0x40
 [<ffffffff810f5463>] wait_on_page_bit+0x73/0x80
 [<ffffffff8107fd90>] ? wake_bit_function+0x0/0x50
 [<ffffffff810f550a>] __lock_page_or_retry+0x3a/0x60
 [<ffffffff810f6537>] filemap_fault+0x2d7/0x4c0
 [<ffffffff81119694>] __do_fault+0x54/0x570
 [<ffffffff81119ca7>] handle_pte_fault+0xf7/0xb20
 [<ffffffff81115727>] ? __pte_alloc+0xd7/0xf0
 [<ffffffff8111a826>] handle_mm_fault+0x156/0x250
 [<ffffffff814a7da3>] do_page_fault+0x143/0x4b0
 [<ffffffff81185100>] ? sys_epoll_wait+0xa0/0x450
 [<ffffffff810587c0>] ? default_wake_function+0x0/0x20
 [<ffffffff810899f3>] ? ktime_get_ts+0xb3/0xf0
 [<ffffffff814a4b15>] page_fault+0x25/0x30
INFO: task irqbalance:3328 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
irqbalance      D ffff88105de3cfe0     0  3328      1 0x00000080
 ffff88105de91b08 0000000000000082 ffff88105de91ab8 ffffffff00000000
 0000000000014d40 ffff88105de3ca80 ffff88105de3cfe0 ffff88105de91fd8
 ffff88105de3cfe8 0000000000014d40 ffff88105de90010 0000000000014d40
Call Trace:
 [<ffffffff810f5260>] ? sync_page+0x0/0x50
 [<ffffffff814a2330>] io_schedule+0x70/0xc0
 [<ffffffff810f52a0>] sync_page+0x40/0x50
 [<ffffffff814a2bef>] __wait_on_bit+0x5f/0x90
 [<ffffffff810f5463>] wait_on_page_bit+0x73/0x80
 [<ffffffff8107fd90>] ? wake_bit_function+0x0/0x50
 [<ffffffff810f550a>] __lock_page_or_retry+0x3a/0x60
 [<ffffffff810f6537>] filemap_fault+0x2d7/0x4c0
 [<ffffffff81119694>] __do_fault+0x54/0x570
 [<ffffffff81119ca7>] handle_pte_fault+0xf7/0xb20
 [<ffffffff81115727>] ? __pte_alloc+0xd7/0xf0
 [<ffffffff8111a826>] handle_mm_fault+0x156/0x250
 [<ffffffff814a7da3>] do_page_fault+0x143/0x4b0
 [<ffffffff810848d4>] ? hrtimer_nanosleep+0xc4/0x180
 [<ffffffff810836f0>] ? hrtimer_wakeup+0x0/0x30
 [<ffffffff81084704>] ? hrtimer_start_range_ns+0x14/0x20
 [<ffffffff814a4b15>] page_fault+0x25/0x30
INFO: task rpcbind:3342 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rpcbind         D ffff880c5b0126e0     0  3342      1 0x00000080
 ffff880c5143bb08 0000000000000086 ffff880c5143bab8 ffffffff00000000
 0000000000014d40 ffff880c5b012180 ffff880c5b0126e0 ffff880c5143bfd8
 ffff880c5b0126e8 0000000000014d40 ffff880c5143a010 0000000000014d40
Call Trace:
 [<ffffffff810f5260>] ? sync_page+0x0/0x50
 [<ffffffff814a2330>] io_schedule+0x70/0xc0
 [<ffffffff810f52a0>] sync_page+0x40/0x50
 [<ffffffff814a2bef>] __wait_on_bit+0x5f/0x90
 [<ffffffff810f5463>] wait_on_page_bit+0x73/0x80
 [<ffffffff8107fd90>] ? wake_bit_function+0x0/0x50
 [<ffffffff810f550a>] __lock_page_or_retry+0x3a/0x60
 [<ffffffff810f6537>] filemap_fault+0x2d7/0x4c0
 [<ffffffff8115b110>] ? pollwake+0x0/0x60
 [<ffffffff81119694>] __do_fault+0x54/0x570
 [<ffffffff81119ca7>] handle_pte_fault+0xf7/0xb20
 [<ffffffff81115727>] ? __pte_alloc+0xd7/0xf0
 [<ffffffff8111a826>] handle_mm_fault+0x156/0x250
 [<ffffffff814a7da3>] do_page_fault+0x143/0x4b0
 [<ffffffff8116562f>] ? vfsmount_lock_global_unlock_online+0x4f/0x60
 [<ffffffff8116604c>] ? mntput_no_expire+0x19c/0x1c0
 [<ffffffff81013219>] ? read_tsc+0x9/0x20
 [<ffffffff810899f3>] ? ktime_get_ts+0xb3/0xf0
 [<ffffffff8115aecd>] ? poll_select_set_timeout+0x8d/0xa0
 [<ffffffff814a4b15>] page_fault+0x25/0x30
INFO: task hald:6507 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
hald            D ffff880c5bdb25a0     0  6507      1 0x00000080
 ffff880c5d3efb08 0000000000000086 ffff880c5d3efab8 ffffffff00000000
 0000000000014d40 ffff880c5bdb2040 ffff880c5bdb25a0 ffff880c5d3effd8
 ffff880c5bdb25a8 0000000000014d40 ffff880c5d3ee010 0000000000014d40
Call Trace:
 [<ffffffff810f5260>] ? sync_page+0x0/0x50
 [<ffffffff814a2330>] io_schedule+0x70/0xc0
 [<ffffffff810f52a0>] sync_page+0x40/0x50
 [<ffffffff814a2a9a>] __wait_on_bit_lock+0x5a/0xc0
 [<ffffffff810f5237>] __lock_page+0x67/0x70
 [<ffffffff8107fd90>] ? wake_bit_function+0x0/0x50
 [<ffffffff810f551d>] __lock_page_or_retry+0x4d/0x60
 [<ffffffff810f6537>] filemap_fault+0x2d7/0x4c0
 [<ffffffff81119694>] __do_fault+0x54/0x570
 [<ffffffff81119ca7>] handle_pte_fault+0xf7/0xb20
 [<ffffffff81115727>] ? __pte_alloc+0xd7/0xf0
 [<ffffffff8111a826>] handle_mm_fault+0x156/0x250
 [<ffffffff814a7da3>] do_page_fault+0x143/0x4b0
 [<ffffffff81013219>] ? read_tsc+0x9/0x20
 [<ffffffff810899f3>] ? ktime_get_ts+0xb3/0xf0
 [<ffffffff8115aecd>] ? poll_select_set_timeout+0x8d/0xa0
 [<ffffffff814a4b15>] page_fault+0x25/0x30
INFO: task automount:6568 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
automount       D ffff88085def46e0     0  6568      1 0x00000080
 ffff88085decfb08 0000000000000082 ffff88085decfab8 ffffffff00000000
 0000000000014d40 ffff88085def4180 ffff88085def46e0 ffff88085decffd8
 ffff88085def46e8 0000000000014d40 ffff88085dece010 0000000000014d40
Call Trace:
 [<ffffffff810f5260>] ? sync_page+0x0/0x50
 [<ffffffff814a2330>] io_schedule+0x70/0xc0
 [<ffffffff810f52a0>] sync_page+0x40/0x50
 [<ffffffff814a2bef>] __wait_on_bit+0x5f/0x90
 [<ffffffff810f5463>] wait_on_page_bit+0x73/0x80
 [<ffffffff8107fd90>] ? wake_bit_function+0x0/0x50
 [<ffffffff810f550a>] __lock_page_or_retry+0x3a/0x60
 [<ffffffff810f6537>] filemap_fault+0x2d7/0x4c0
 [<ffffffff81119694>] __do_fault+0x54/0x570
 [<ffffffff81119ca7>] handle_pte_fault+0xf7/0xb20
 [<ffffffff81115727>] ? __pte_alloc+0xd7/0xf0
 [<ffffffff8111a826>] handle_mm_fault+0x156/0x250
 [<ffffffff814a7da3>] do_page_fault+0x143/0x4b0
 [<ffffffff810933eb>] ? sys_futex+0x7b/0x180
 [<ffffffff814a4b15>] page_fault+0x25/0x30
INFO: task automount:6569 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
automount       D ffff88085d387b20     0  6569      1 0x00000080
 ffff88085d27bb08 0000000000000082 ffff88085d27bab8 ffffffff00000000
 0000000000014d40 ffff88085d3875c0 ffff88085d387b20 ffff88085d27bfd8
 ffff88085d387b28 0000000000014d40 ffff88085d27a010 0000000000014d40
Call Trace:
 [<ffffffff810f5260>] ? sync_page+0x0/0x50
 [<ffffffff814a2330>] io_schedule+0x70/0xc0
 [<ffffffff810f52a0>] sync_page+0x40/0x50
 [<ffffffff814a2bef>] __wait_on_bit+0x5f/0x90
 [<ffffffff810f5463>] wait_on_page_bit+0x73/0x80
 [<ffffffff8107fd90>] ? wake_bit_function+0x0/0x50
 [<ffffffff810f550a>] __lock_page_or_retry+0x3a/0x60
 [<ffffffff810f6537>] filemap_fault+0x2d7/0x4c0
 [<ffffffff81119694>] __do_fault+0x54/0x570
 [<ffffffff81119ca7>] handle_pte_fault+0xf7/0xb20
 [<ffffffff81115727>] ? __pte_alloc+0xd7/0xf0
 [<ffffffff8111a826>] handle_mm_fault+0x156/0x250
 [<ffffffff8104c846>] ? enqueue_task+0x66/0x80
 [<ffffffff814a7da3>] do_page_fault+0x143/0x4b0
 [<ffffffff8105dad0>] ? do_fork+0xe0/0x340
 [<ffffffff810933eb>] ? sys_futex+0x7b/0x180
 [<ffffffff814a4b15>] page_fault+0x25/0x30
INFO: task ntpd:6607 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ntpd            D ffff88105d9c9a20     0  6607      1 0x00000080
 ffff88105ea6db08 0000000000000086 ffff88105ea6dab8 ffffffff00000000
 0000000000014d40 ffff88105d9c94c0 ffff88105d9c9a20 ffff88105ea6dfd8
 ffff88105d9c9a28 0000000000014d40 ffff88105ea6c010 0000000000014d40
Call Trace:
 [<ffffffff810f5260>] ? sync_page+0x0/0x50
 [<ffffffff814a2330>] io_schedule+0x70/0xc0
 [<ffffffff810f52a0>] sync_page+0x40/0x50
 [<ffffffff814a2bef>] __wait_on_bit+0x5f/0x90
 [<ffffffff810f5463>] wait_on_page_bit+0x73/0x80
 [<ffffffff8107fd90>] ? wake_bit_function+0x0/0x50
 [<ffffffff810f550a>] __lock_page_or_retry+0x3a/0x60
 [<ffffffff810f6537>] filemap_fault+0x2d7/0x4c0
 [<ffffffff81119694>] __do_fault+0x54/0x570
 [<ffffffff81119ca7>] handle_pte_fault+0xf7/0xb20
 [<ffffffff81115727>] ? __pte_alloc+0xd7/0xf0
 [<ffffffff8111a826>] handle_mm_fault+0x156/0x250
 [<ffffffff814a7da3>] do_page_fault+0x143/0x4b0
 [<ffffffff8115ba41>] ? poll_select_copy_remaining+0x101/0x150
 [<ffffffff814a4b15>] page_fault+0x25/0x30
INFO: task master:6683 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
master          D ffff88105f9eb9a0     0  6683      1 0x00000080
 ffff88105e039b08 0000000000000082 ffff88105e039ab8 ffffffffa00040bc
 0000000000014d40 ffff88105f9eb440 ffff88105f9eb9a0 ffff88105e039fd8
 ffff88105f9eb9a8 0000000000014d40 ffff88105e038010 0000000000014d40
Call Trace:
 [<ffffffffa00040bc>] ? dm_table_unplug_all+0x5c/0x110 [dm_mod]
 [<ffffffff810f5260>] ? sync_page+0x0/0x50
 [<ffffffff814a2330>] io_schedule+0x70/0xc0
 [<ffffffff810f52a0>] sync_page+0x40/0x50
 [<ffffffff814a2bef>] __wait_on_bit+0x5f/0x90
 [<ffffffff810f5463>] wait_on_page_bit+0x73/0x80
 [<ffffffff8107fd90>] ? wake_bit_function+0x0/0x50
 [<ffffffff810f550a>] __lock_page_or_retry+0x3a/0x60
 [<ffffffff810f6537>] filemap_fault+0x2d7/0x4c0
 [<ffffffff81119694>] __do_fault+0x54/0x570
 [<ffffffff81119ca7>] handle_pte_fault+0xf7/0xb20
 [<ffffffff81115727>] ? __pte_alloc+0xd7/0xf0
 [<ffffffff8111a826>] handle_mm_fault+0x156/0x250
 [<ffffffff814a7da3>] do_page_fault+0x143/0x4b0
 [<ffffffff81185100>] ? sys_epoll_wait+0xa0/0x450
 [<ffffffff810587c0>] ? default_wake_function+0x0/0x20
 [<ffffffff814a4b15>] page_fault+0x25/0x30
INFO: task pickup:6690 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
pickup          D ffff88085e5a6fe0     0  6690   6683 0x00000080
 ffff88085fa2db08 0000000000000086 ffff88085fa2dab8 ffffffffa00040bc
 0000000000014d40 ffff88085e5a6a80 ffff88085e5a6fe0 ffff88085fa2dfd8
 ffff88085e5a6fe8 0000000000014d40 ffff88085fa2c010 0000000000014d40
Call Trace:
 [<ffffffffa00040bc>] ? dm_table_unplug_all+0x5c/0x110 [dm_mod]
 [<ffffffff810f5260>] ? sync_page+0x0/0x50
 [<ffffffff814a2330>] io_schedule+0x70/0xc0
 [<ffffffff810f52a0>] sync_page+0x40/0x50
 [<ffffffff814a2bef>] __wait_on_bit+0x5f/0x90
 [<ffffffff810f5463>] wait_on_page_bit+0x73/0x80
 [<ffffffff8107fd90>] ? wake_bit_function+0x0/0x50
 [<ffffffff810f550a>] __lock_page_or_retry+0x3a/0x60
 [<ffffffff810f6537>] filemap_fault+0x2d7/0x4c0
 [<ffffffff81119694>] __do_fault+0x54/0x570
 [<ffffffff81119ca7>] handle_pte_fault+0xf7/0xb20
 [<ffffffff81115727>] ? __pte_alloc+0xd7/0xf0
 [<ffffffff8111a826>] handle_mm_fault+0x156/0x250
 [<ffffffff814a7da3>] do_page_fault+0x143/0x4b0
 [<ffffffff81185100>] ? sys_epoll_wait+0xa0/0x450
 [<ffffffff810587c0>] ? default_wake_function+0x0/0x20
 [<ffffffff814a4b15>] page_fault+0x25/0x30
INFO: task abrtd:6694 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
abrtd           D ffff88045e53c6a0     0  6694      1 0x00000080
 ffff88045cdb9b08 0000000000000082 ffff88045cdb9ab8 ffffffff00000000
 0000000000014d40 ffff88045e53c140 ffff88045e53c6a0 ffff88045cdb9fd8
 ffff88045e53c6a8 0000000000014d40 ffff88045cdb8010 0000000000014d40
Call Trace:
 [<ffffffff810f5260>] ? sync_page+0x0/0x50
 [<ffffffff814a2330>] io_schedule+0x70/0xc0
 [<ffffffff810f52a0>] sync_page+0x40/0x50
 [<ffffffff814a2bef>] __wait_on_bit+0x5f/0x90
 [<ffffffff810f5463>] wait_on_page_bit+0x73/0x80
 [<ffffffff8107fd90>] ? wake_bit_function+0x0/0x50
 [<ffffffff810f550a>] __lock_page_or_retry+0x3a/0x60
 [<ffffffff810f6537>] filemap_fault+0x2d7/0x4c0
 [<ffffffff8115b110>] ? pollwake+0x0/0x60
 [<ffffffff81119694>] __do_fault+0x54/0x570
 [<ffffffff81119ca7>] handle_pte_fault+0xf7/0xb20
 [<ffffffff81115727>] ? __pte_alloc+0xd7/0xf0
 [<ffffffff8111a826>] handle_mm_fault+0x156/0x250
 [<ffffffff814a7da3>] do_page_fault+0x143/0x4b0
 [<ffffffff81013219>] ? read_tsc+0x9/0x20
 [<ffffffff810899f3>] ? ktime_get_ts+0xb3/0xf0
 [<ffffffff8115aecd>] ? poll_select_set_timeout+0x8d/0xa0
 [<ffffffff814a4b15>] page_fault+0x25/0x30

Thanks.

CAI Qian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]