Re: stalls with latest btrfs merge into 3.0-rc2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/13/2011 05:07 PM, Jim Schutt wrote:
Hi,

On a system under a heavy write load from multiple ceph OSDs,
I'm running into the following hung tasks where btrfs is implicated.
I'm running commit 3c25fa740e2 from Linus' tree merged with
commit cb9b41c92fa from git://ceph.newdream.net/git/ceph-client.git.

Let me know what else I can do to help sort this out.

[ 961.318047] INFO: task kworker/1:2:2346 blocked for more than 120
seconds.
[ 961.324993] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 961.332891] ffff8802253dbcd0 0000000000000046 ffff8802253dbcb0
ffff88021c138000
[ 961.340398] ffff880222c3dac0 ffff880222c3dac0 00000000000108c0
ffff88021c138000
[ 961.347893] ffff880222c3de80 0000000000000001 ffff8802253dbd00
ffffffff813b0f69
[ 961.355384] Call Trace:
[ 961.357838] [<ffffffff813b0f69>] schedule+0x164/0x19e
[ 961.363041] [<ffffffffa066069c>] btrfs_start_ordered_extent+0xa8/0xc4
[btrfs]
[ 961.370268] [<ffffffff81066544>] ? list_del_init+0x21/0x21
[ 961.376075] [<ffffffffa0660f93>] btrfs_wait_ordered_extents+0xd8/0x143
[btrfs]
[ 961.383387] [<ffffffffa0647065>] btrfs_commit_transaction+0x20b/0x5a4
[btrfs]
[ 961.390642] [<ffffffff81066544>] ? list_del_init+0x21/0x21
[ 961.396284] [<ffffffffa064741d>] do_async_commit+0x1f/0x2c [btrfs]
[ 961.402638] [<ffffffff81060bba>] process_one_work+0x124/0x1e0
[ 961.408478] [<ffffffffa06473fe>] ?
btrfs_commit_transaction+0x5a4/0x5a4 [btrfs]
[ 961.415917] [<ffffffff81062219>] ? destroy_workqueue+0x161/0x161
[ 961.422155] [<ffffffff810622a8>] worker_thread+0x8f/0x124
[ 961.427642] [<ffffffff8106603f>] kthread+0x72/0x7a
[ 961.432636] [<ffffffff813b9e94>] kernel_thread_helper+0x4/0x10
[ 961.438561] [<ffffffff813b272d>] ? retint_restore_args+0xe/0xe
[ 961.444530] [<ffffffff81065fcd>] ? kthread_bind+0x53/0x53
[ 961.450100] [<ffffffff813b9e90>] ? gs_change+0xb/0xb
[ 961.455188] INFO: task btrfs-transacti:7653 blocked for more than 120
seconds.
[ 961.462506] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 961.477512] ffff88021c24fd90 0000000000000046 ffff88021c24fd70
ffff880190a65ac0
[ 961.491735] ffff8802246cad60 ffff8802246cad60 00000000000108c0
ffff880190a65ac0
[ 961.505853] ffff8802246cb120 0000000000000001 ffff88021c24fdc0
ffffffff813b0f69
[ 961.521106] Call Trace:
[ 961.526052] [<ffffffff813b0f69>] schedule+0x164/0x19e
[ 961.531910] [<ffffffffa06460b4>] wait_current_trans+0xb8/0xec [btrfs]
[ 961.544101] [<ffffffff81066544>] ? list_del_init+0x21/0x21
[ 961.550394] [<ffffffffa0645cd2>] ? spin_lock+0xe/0x10 [btrfs]
[ 961.556471] [<ffffffffa0647970>] start_transaction+0xd1/0x206 [btrfs]
[ 961.563023] [<ffffffffa0643ec7>] ?
btree_readpage_end_io_hook+0x192/0x192 [btrfs]
[ 961.570597] [<ffffffffa0647ae8>] btrfs_join_transaction+0x15/0x17 [btrfs]
[ 961.577609] [<ffffffffa064401b>] transaction_kthread+0x154/0x22d [btrfs]
[ 961.584465] [<ffffffff81036a33>] ? need_resched+0x23/0x2d
[ 961.589967] [<ffffffff8106603f>] kthread+0x72/0x7a
[ 961.594843] [<ffffffff813b9e94>] kernel_thread_helper+0x4/0x10
[ 961.600777] [<ffffffff813b272d>] ? retint_restore_args+0xe/0xe
[ 961.606702] [<ffffffff81065fcd>] ? kthread_bind+0x53/0x53
[ 961.612206] [<ffffffff813b9e90>] ? gs_change+0xb/0xb
[ 961.617322] INFO: task cosd:16719 blocked for more than 120 seconds.
[ 961.623702] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 961.631542] ffff8801bf163cb8 0000000000000086 ffff8801bf163c98
ffff880226db16b0
[ 961.639072] ffff8801df4f5ac0 ffff8801df4f5ac0 00000000000108c0
ffff880226db16b0
[ 961.646516] ffff8801df4f5e80 0000000000000001 ffff8801bf163ce8
ffffffff813b0f69
[ 961.653955] Call Trace:
[ 961.656483] [<ffffffff813b0f69>] schedule+0x164/0x19e
[ 961.661725] [<ffffffffa0645dbd>]
wait_current_trans_commit_start_and_unblock+0xa9/0xce [btrfs]
[ 961.670461] [<ffffffff81066544>] ? list_del_init+0x21/0x21
[ 961.676028] [<ffffffff81061512>] ? queue_work+0x1f/0x21
[ 961.681436] [<ffffffffa0647bbd>]
btrfs_commit_transaction_async+0xd3/0x115 [btrfs]
[ 961.689118] [<ffffffffa066d8b3>] create_snapshot+0xe5/0x177 [btrfs]
[ 961.695546] [<ffffffffa0671559>] btrfs_mksubvol+0xfa/0x167 [btrfs]
[ 961.702416] [<ffffffffa06716c5>]
btrfs_ioctl_snap_create_transid+0xff/0x121 [btrfs]
[ 961.710454] [<ffffffffa067176f>] btrfs_ioctl_snap_create_v2+0x88/0xea
[btrfs]
[ 961.718000] [<ffffffffa0671a40>] btrfs_ioctl+0x208/0x358 [btrfs]
[ 961.724088] [<ffffffff8112221a>] vfs_ioctl+0x1d/0x34
[ 961.729159] [<ffffffff81122895>] do_vfs_ioctl+0x171/0x17a
[ 961.734732] [<ffffffff811154bb>] ? fget_light+0x69/0x81
[ 961.740057] [<ffffffff811228fa>] sys_ioctl+0x5c/0x7c
[ 961.745333] [<ffffffffa0534af5>] ? jbd_free_handle+0x1b/0x1d [jbd]
[ 961.752028] [<ffffffff813b8dab>] system_call_fastpath+0x16/0x1b
[ 961.758429] INFO: task cosd:16720 blocked for more than 120 seconds.
[ 961.765416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 961.776419] ffff8801bf123bb8 0000000000000086 ffff8801bf123b98
ffff8801664bad60
[ 961.792604] ffff8801df4f2d60 ffff8801df4f2d60 00000000000108c0
ffff8801664bad60
[ 961.814250] ffff8801df4f3120 0000000000000000 ffff8801bf123be8
ffffffff813b0f69
[ 961.825356] Call Trace:
[ 961.830932] [<ffffffff813b0f69>] schedule+0x164/0x19e
[ 961.837403] [<ffffffffa06460b4>] wait_current_trans+0xb8/0xec [btrfs]
[ 961.844413] [<ffffffff81066544>] ? list_del_init+0x21/0x21
[ 961.850203] [<ffffffff81109fe6>] ? kmem_cache_alloc+0xad/0xb9
[ 961.856255] [<ffffffffa0647970>] start_transaction+0xd1/0x206 [btrfs]
[ 961.863351] [<ffffffffa0647c12>] btrfs_start_transaction+0x13/0x15
[btrfs]
[ 961.870549] [<ffffffffa064dc24>] btrfs_create+0x3b/0x197 [btrfs]
[ 961.876889] [<ffffffff8111edd8>] vfs_create+0x72/0x92
[ 961.882115] [<ffffffff8111f024>] do_last+0x22c/0x40b
[ 961.887195] [<ffffffff8111fd02>] path_openat+0xc0/0x2ef
[ 961.892581] [<ffffffff81121011>] do_filp_open+0x3d/0x87
[ 961.897888] [<ffffffff811e070c>] ? strncpy_from_user+0x43/0x4d
[ 961.903880] [<ffffffff8111e78d>] ? getname_flags+0x2e/0x80
[ 961.909463] [<ffffffff8111e711>] ? do_getname+0x14b/0x173
[ 961.915472] [<ffffffff8111e74f>] ? audit_getname+0x16/0x26
[ 961.921259] [<ffffffff8112a743>] ? spin_lock+0xe/0x10
[ 961.926480] [<ffffffff811122d1>] do_sys_open+0x69/0xae
[ 961.931759] [<ffffffff81112349>] sys_open+0x20/0x22
[ 961.936730] [<ffffffff813b8dab>] system_call_fastpath+0x16/0x1b
[ 961.942878] INFO: task cosd:16721 blocked for more than 120 seconds.
[ 961.949231] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 961.957398] ffff8801502dfb38 0000000000000086 ffff8801502dfb18
ffff8801dd19c410
[ 961.965782] ffff880150314410 ffff880150314410 00000000000108c0
ffff8801dd19c410
[ 961.973784] ffff8801503147d0 0000000000400000 ffff8801502dfb68
ffffffff813b0f69
[ 961.981702] Call Trace:
[ 961.984239] [<ffffffff813b0f69>] schedule+0x164/0x19e
[ 961.989445] [<ffffffffa06460b4>] wait_current_trans+0xb8/0xec [btrfs]
[ 961.995993] [<ffffffff81066544>] ? list_del_init+0x21/0x21
[ 962.001781] [<ffffffffa0647692>] btrfs_throttle+0x1f/0x21 [btrfs]
[ 962.008040] [<ffffffffa0655672>] __btrfs_buffered_write+0x29e/0x2dd
[btrfs]
[ 962.015217] [<ffffffffa06559df>] btrfs_file_aio_write+0x203/0x2df [btrfs]
[ 962.022152] [<ffffffffa06557dc>] ? __btrfs_direct_write+0x12b/0x12b
[btrfs]
[ 962.029240] [<ffffffff81113a17>] do_sync_readv_writev+0xc1/0x109
[ 962.035448] [<ffffffff8112c187>] ? mnt_dec_count+0x11/0x13
[ 962.041016] [<ffffffff81036a33>] ? need_resched+0x23/0x2d
[ 962.046796] [<ffffffff81036a4b>] ? should_resched+0xe/0x2f
[ 962.057128] [<ffffffff811986c3>] ? security_file_permission+0x2c/0x31
[ 962.068078] [<ffffffff81113d21>] ? rw_verify_area+0xac/0xdb
[ 962.073949] [<ffffffff811140a6>] do_readv_writev+0xbf/0x127
[ 962.079741] [<ffffffff8111e78d>] ? getname_flags+0x2e/0x80
[ 962.096720] [<ffffffff81106005>] ? compound_head+0x12/0x20
[ 962.103380] [<ffffffff81036a33>] ? need_resched+0x23/0x2d
[ 962.109497] [<ffffffff81036a4b>] ? should_resched+0xe/0x2f
[ 962.119919] [<ffffffff813b117f>] ? _cond_resched+0xe/0x22
[ 962.131022] [<ffffffff813b1895>] ? mutex_lock+0x16/0x34
[ 962.137370] [<ffffffff81114151>] vfs_writev+0x43/0x4e
[ 962.142531] [<ffffffff81114729>] sys_writev+0x4c/0x71
[ 962.148853] [<ffffffff813b8dab>] system_call_fastpath+0x16/0x1b
[ 962.155452] INFO: task btrfs-endio-wri:23430 blocked for more than 120
seconds.
[ 962.162885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 962.170846] ffff8801dd943c20 0000000000000046 ffff8801dd943c00
ffff8801df4f5ac0
[ 962.178326] ffff880182060000 ffff880182060000 00000000000108c0
ffff8801df4f5ac0
[ 962.185821] ffff8801820603c0 0000000000000001 ffff8801dd943c50
ffffffff813b0f69
[ 962.193386] Call Trace:
[ 962.195837] [<ffffffff813b0f69>] schedule+0x164/0x19e
[ 962.201044] [<ffffffffa06460b4>] wait_current_trans+0xb8/0xec [btrfs]
[ 962.208020] [<ffffffff81066544>] ? list_del_init+0x21/0x21
[ 962.213751] [<ffffffffa0645cd2>] ? spin_lock+0xe/0x10 [btrfs]
[ 962.219713] [<ffffffffa0647970>] start_transaction+0xd1/0x206 [btrfs]
[ 962.226247] [<ffffffffa0647ae8>] btrfs_join_transaction+0x15/0x17 [btrfs]
[ 962.233127] [<ffffffffa064ec90>] btrfs_finish_ordered_io+0x135/0x2c1
[btrfs]
[ 962.240308] [<ffffffffa064ee59>] btrfs_writepage_end_io_hook+0x3d/0x45
[btrfs]
[ 962.247648] [<ffffffffa0665e20>] end_bio_extent_writepage+0xaa/0x15a
[btrfs]
[ 962.255291] [<ffffffff8113d194>] bio_endio+0x2d/0x2f
[ 962.260613] [<ffffffffa0642c2c>] end_workqueue_fn+0x72/0x7b [btrfs]
[ 962.267510] [<ffffffffa066cd4f>] worker_loop+0x60/0x1f6 [btrfs]
[ 962.273543] [<ffffffff81033dad>] ? calc_load_account_idle+0xe/0x1d
[ 962.279841] [<ffffffffa066ccef>] ? run_ordered_completions+0x99/0x99
[btrfs]
[ 962.286968] [<ffffffff8106603f>] kthread+0x72/0x7a
[ 962.291965] [<ffffffff813b9e94>] kernel_thread_helper+0x4/0x10
[ 962.297888] [<ffffffff813b272d>] ? retint_restore_args+0xe/0xe
[ 962.303801] [<ffffffff81065fcd>] ? kthread_bind+0x53/0x53
[ 962.309278] [<ffffffff813b9e90>] ? gs_change+0xb/0xb


Ugh that's not good. I'll look at it tomorrow and send you something to test. Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux