Frequent Crashes on rbd to nfs gateway Server

ilya.dryomov@xxxxxxxxxxx (Ilya Dryomov) · Wed, 24 Sep 2014 15:50:00 +0400

On Wed, Sep 24, 2014 at 12:12 PM, Micha Krause <micha at krausam.de> wrote:
> Hi,
>
> I was able to get a dmesg output from the centos Machine with kernel 3.16:
>
>  kworker/3:2:9521 blocked for more than 120 seconds.
>        Not tainted 3.16.2-1.el6.elrepo.x86_64 #1
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>  kworker/3:2     D 0000000000000003     0  9521      2 0x00000080
>  Workqueue: events handle_timeout [libceph]
>   ffff8801228cfcd8 0000000000000046 0000000300000000 ffff8801228cc010
>   0000000000014400 0000000000014400 ffff8800ba01c250 ffff880234ed3070
>   0000000000000000 ffff8800baf237c8 ffff8800baf237cc ffff8800ba01c250
>  Call Trace:
>   [<ffffffff81647da9>] schedule+0x29/0x70
>   [<ffffffff81647f0e>] schedule_preempt_disabled+0xe/0x10
>   [<ffffffff8164987b>] __mutex_lock_slowpath+0xdb/0x1d0
>   [<ffffffff81649993>] mutex_lock+0x23/0x40
>   [<ffffffffa0348c73>] handle_timeout+0x63/0x1c0 [libceph]
>   [<ffffffff8108d60c>] process_one_work+0x17c/0x420
>   [<ffffffff8108e7d3>] worker_thread+0x123/0x420
>   [<ffffffff8108e6b0>] ? maybe_create_worker+0x180/0x180
>   [<ffffffff810943be>] kthread+0xce/0xf0
>   [<ffffffff810942f0>] ? kthread_freezable_should_stop+0x70/0x70
>   [<ffffffff8164b5bc>] ret_from_fork+0x7c/0xb0
>   [<ffffffff810942f0>] ? kthread_freezable_should_stop+0x70/0x70
>  INFO: task kworker/3:1:62 blocked for more than 120 seconds.
>        Not tainted 3.16.2-1.el6.elrepo.x86_64 #1
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>  kworker/3:1     D 0000000000000003     0    62      2 0x00000000
>  Workqueue: events handle_osds_timeout [libceph]
>   ffff880037907ce8 0000000000000046 0000000000000000 ffff880037904010
>   0000000000014400 0000000000014400 ffff880232389130 ffff880234ed3070
>   ffffffff8101d833 ffff8800baf237c8 ffff8800baf237cc ffff880232389130
>  Call Trace:
>   [<ffffffff8101d833>] ? native_sched_clock+0x33/0xd0
>   [<ffffffff81647da9>] schedule+0x29/0x70
>   [<ffffffff81647f0e>] schedule_preempt_disabled+0xe/0x10
>   [<ffffffff8164987b>] __mutex_lock_slowpath+0xdb/0x1d0
>   [<ffffffff810afbbf>] ? put_prev_entity+0x2f/0x400
>   [<ffffffff81649993>] mutex_lock+0x23/0x40
>   [<ffffffffa0347003>] handle_osds_timeout+0x53/0x120 [libceph]
>   [<ffffffff8108d60c>] process_one_work+0x17c/0x420
>   [<ffffffff8108e7d3>] worker_thread+0x123/0x420
>   [<ffffffff8108e6b0>] ? maybe_create_worker+0x180/0x180
>   [<ffffffff810943be>] kthread+0xce/0xf0
>   [<ffffffff810942f0>] ? kthread_freezable_should_stop+0x70/0x70
>   [<ffffffff8164b5bc>] ret_from_fork+0x7c/0xb0
>   [<ffffffff810942f0>] ? kthread_freezable_should_stop+0x70/0x70
>  INFO: task kworker/u8:0:9486 blocked for more than 120 seconds.
>        Not tainted 3.16.2-1.el6.elrepo.x86_64 #1
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>  kworker/u8:0    D 0000000000000002     0  9486      2 0x00000080
>  Workqueue: writeback bdi_writeback_workfn (flush-253:7)
>   ffff8802337cf368 0000000000000046 00000000ae5d42c1 ffff8802337cc010
>   0000000000014400 0000000000014400 ffff880232554fb0 ffff8800ba4be210
>   ffff8802337cc010 ffff8800ba5579b8 ffff880232fa0250 ffff880232554fb0
>  Call Trace:
>   [<ffffffff81647da9>] schedule+0x29/0x70
>   [<ffffffff81647f0e>] schedule_preempt_disabled+0xe/0x10
>   [<ffffffff81649962>] __mutex_lock_slowpath+0x1c2/0x1d0
>   [<ffffffff81649993>] mutex_lock+0x23/0x40
>   [<ffffffffa033fc2d>] ceph_con_send+0x4d/0x150 [libceph]
>   [<ffffffffa0348bc4>] __send_queued+0x134/0x180 [libceph]
>   [<ffffffffa0349e7b>] __ceph_osdc_start_request+0x5b/0xb0 [libceph]
>   [<ffffffffa0349f21>] ceph_osdc_start_request+0x51/0x80 [libceph]
>   [<ffffffffa037b2a0>] rbd_img_obj_request_submit+0xb0/0x110 [rbd]
>   [<ffffffffa037b349>] rbd_img_request_submit+0x49/0x60 [rbd]
>   [<ffffffffa037bcd8>] rbd_request_fn+0x248/0x2b0 [rbd]
>   [<ffffffff812b22e7>] __blk_run_queue+0x37/0x50
>   [<ffffffff812b296e>] queue_unplugged+0x4e/0xb0
>   [<ffffffff812b2b2e>] blk_flush_plug_list+0x15e/0x200
>   [<ffffffff81647e65>] io_schedule+0x75/0xd0
>   [<ffffffff812b3f87>] get_request+0x167/0x340
>   [<ffffffff810b6220>] ? bit_waitqueue+0xe0/0xe0
>   [<ffffffff812ae78b>] ? elv_merge+0xeb/0xf0
>   [<ffffffff812b4228>] blk_queue_bio+0xc8/0x340
>   [<ffffffff812b30f0>] generic_make_request+0xc0/0x100
>   [<ffffffff812b31b0>] submit_bio+0x80/0x170
>   [<ffffffff812abdf1>] ? bio_alloc_bioset+0xa1/0x1e0
>   [<ffffffff811ff4a6>] _submit_bh+0x146/0x220
>   [<ffffffff811ff590>] submit_bh+0x10/0x20
>   [<ffffffff81202ed3>] __block_write_full_page.clone.0+0x1a3/0x340
>   [<ffffffff81203790>] ? I_BDEV+0x10/0x10
>   [<ffffffff81203790>] ? I_BDEV+0x10/0x10
>   [<ffffffff81203246>] block_write_full_page+0xc6/0x100
>   [<ffffffff81204848>] blkdev_writepage+0x18/0x20
>   [<ffffffff81163be7>] __writepage+0x17/0x50
>   [<ffffffff81164fe4>] write_cache_pages+0x244/0x510
>   [<ffffffff81163bd0>] ? set_page_dirty+0x60/0x60
>   [<ffffffff81165301>] generic_writepages+0x51/0x80
>   [<ffffffff81165350>] do_writepages+0x20/0x40
>   [<ffffffff811f6309>] __writeback_single_inode+0x49/0x230
>   [<ffffffff810b665f>] ? wake_up_bit+0x2f/0x40
>   [<ffffffff811f7149>] writeback_sb_inodes+0x279/0x390
>   [<ffffffff811d03d5>] ? put_super+0x25/0x40
>   [<ffffffff811f72fe>] __writeback_inodes_wb+0x9e/0xd0
>   [<ffffffff811f752b>] wb_writeback+0x1fb/0x2c0
>   [<ffffffff811f76f0>] wb_do_writeback+0x100/0x1f0
>   [<ffffffff811f7a60>] bdi_writeback_workfn+0x70/0x210
>   [<ffffffff8108d60c>] process_one_work+0x17c/0x420
>   [<ffffffff8108e7d3>] worker_thread+0x123/0x420
>   [<ffffffff8108e6b0>] ? maybe_create_worker+0x180/0x180
>   [<ffffffff810943be>] kthread+0xce/0xf0
>   [<ffffffff810942f0>] ? kthread_freezable_should_stop+0x70/0x70
>   [<ffffffff8164b5bc>] ret_from_fork+0x7c/0xb0
>   [<ffffffff810942f0>] ? kthread_freezable_should_stop+0x70/0x70

Sorry, this is a known rbd deadlock in 3.15/3.16.  3.16.3 has a fix.

I'd be very interested to see something similar for 3.13.

Thanks,

                Ilya