writing to rbd mapped device produces hang tasks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hi guys, 

Following up on my previous message. I've tried to repeat the same experiment with 3.16.2 kernel and I still have the same behaviour. The dd process got stuck after running dd for 3 times in a row. The iostat shows that the rbd0 device is fully utilised, but without any activity on the device itself: 

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util 
rbd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 135.00 0.00 0.00 0.00 0.00 100.00 

I've also tried enabling and disabling rbd cache, which didn't make a difference. 

Could someone help me with debugging the issue and getting to the root cause? 

Thanks 

Andrei 
----- Original Message -----

From: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx> 
To: ceph-users at lists.ceph.com 
Sent: Sunday, 14 September, 2014 12:04:15 AM 
Subject: writing to rbd mapped device produces hang tasks 


Hello guys, 

I've been trying to map an rbd disk to run some testing and I've noticed that while I can successfully read from the rbd image mapped to /dev/rbdX, I am failing to reliably write to it. Sometimes write tests work perfectly well, especially if I am using large block sizes. But often writes hang for a considerable amount of time and I have kernel hang task messages (shown below) in my dmesg. the hang tasks show particularly frequently when using 4K block size. However, with large block sizes writes also sometimes hang, but for sure less frequent 

I am using simple dd tests like: dd if=/dev/zero of=<mountpoint/file> bs=4K count=250K. 

I am running firefly on Ubuntu 12.04 on all osd/mon servers. The rbd image is mapped on one of the osd servers. All osd servers are running kernel version 3.15.10-031510-generic. 


Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.439974] INFO: task jbd2/rbd0-8:3505 blocked for more than 120 seconds. 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.441586] Not tainted 3.15.10-031510-generic #201408132333 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.443022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444862] jbd2/rbd0-8 D 0000000000000003 0 3505 2 0x00000000 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444870] ffff8803a10a7c48 0000000000000002 ffff88007963b288 ffff8803a10a7fd8 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444874] 0000000000014540 0000000000014540 ffff880469f63260 ffff880866969930 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444876] ffff8803a10a7c58 ffff8803a10a7d88 ffff88034d142100 ffff880848519824 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444879] Call Trace: 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444893] [<ffffffff81778449>] schedule+0x29/0x70 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444901] [<ffffffff812a0360>] jbd2_journal_commit_transaction+0x240/0x1510 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444908] [<ffffffff810a5045>] ? sched_clock_cpu+0x85/0xc0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444920] [<ffffffff810a5cba>] ? arch_vtime_task_switch+0x8a/0x90 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444923] [<ffffffff810a5cfd>] ? vtime_common_task_switch+0x3d/0x50 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444928] [<ffffffff810b5420>] ? __wake_up_sync+0x20/0x20 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444933] [<ffffffff81079f9f>] ? try_to_del_timer_sync+0x4f/0x70 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444938] [<ffffffff812a4e88>] kjournald2+0xb8/0x240 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444941] [<ffffffff810b5420>] ? __wake_up_sync+0x20/0x20 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444943] [<ffffffff812a4dd0>] ? commit_timeout+0x10/0x10 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444949] [<ffffffff81091439>] kthread+0xc9/0xe0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444952] [<ffffffff81091370>] ? flush_kthread_worker+0xb0/0xb0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444965] [<ffffffff8178567c>] ret_from_fork+0x7c/0xb0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444969] [<ffffffff81091370>] ? flush_kthread_worker+0xb0/0xb0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.445141] INFO: task dd:21180 blocked for more than 120 seconds. 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.446595] Not tainted 3.15.10-031510-generic #201408132333 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.448070] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449910] dd D 0000000000000002 0 21180 19562 0x00000002 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449913] ffff880485a1b5d8 0000000000000002 ffff880485a1b5e8 ffff880485a1bfd8 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449916] 0000000000014540 0000000000014540 ffff880341833260 ffff88011086cb90 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449919] ffff880485a1b5a8 ffff88046fc94e40 ffff88011086cb90 ffffffff81204da0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449921] Call Trace: 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449927] [<ffffffff81204da0>] ? __wait_on_buffer+0x30/0x30 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449930] [<ffffffff81778449>] schedule+0x29/0x70 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449934] [<ffffffff8177851f>] io_schedule+0x8f/0xd0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449936] [<ffffffff81204dae>] sleep_on_buffer+0xe/0x20 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449940] [<ffffffff81778be2>] __wait_on_bit+0x62/0x90 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449945] [<ffffffff8120a110>] ? bio_alloc_bioset+0xa0/0x1d0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449947] [<ffffffff81204da0>] ? __wait_on_buffer+0x30/0x30 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449951] [<ffffffff81778c8c>] out_of_line_wait_on_bit+0x7c/0x90 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449954] [<ffffffff810b54a0>] ? wake_atomic_t_function+0x40/0x40 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449957] [<ffffffff81204d9e>] __wait_on_buffer+0x2e/0x30 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449962] [<ffffffff8124eb28>] ext4_wait_block_bitmap+0xd8/0xe0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449969] [<ffffffff812888de>] ext4_mb_init_cache+0x1de/0x750 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449974] [<ffffffff8115fb65>] ? find_or_create_page+0x65/0xb0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449977] [<ffffffff81288f0e>] ext4_mb_init_group+0xbe/0x130 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449984] [<ffffffff8134bb59>] ? __get_request+0x339/0x560 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449987] [<ffffffff8128948c>] ext4_mb_load_buddy+0x33c/0x360 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449990] [<ffffffff81289bf3>] ext4_mb_find_by_goal+0xa3/0x310 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449993] [<ffffffff8128a60e>] ext4_mb_regular_allocator+0x5e/0x450 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449996] [<ffffffff8127fcf0>] ? ext4_ext_find_extent+0x220/0x2a0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449999] [<ffffffff8128db48>] ext4_mb_new_blocks+0x3f8/0x570 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450002] [<ffffffff8127fbf0>] ? ext4_ext_find_extent+0x120/0x2a0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450004] [<ffffffff8127cabc>] ? ext4_ext_check_overlap.isra.28+0xbc/0xd0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450007] [<ffffffff812839a3>] ext4_ext_map_blocks+0xa13/0xb80 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450010] [<ffffffff81290800>] ? mext_replace_branches+0x220/0x490 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450014] [<ffffffff8125781f>] ext4_map_blocks+0x15f/0x550 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450017] [<ffffffff81256110>] ? mpage_prepare_extent_to_map+0x2d0/0x320 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450020] [<ffffffff81257c7e>] mpage_map_one_extent+0x6e/0x1a0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450023] [<ffffffff8125c34b>] mpage_map_and_submit_extent+0x4b/0x1e0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450026] [<ffffffff8125ca07>] ext4_writepages+0x527/0x7b0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450032] [<ffffffff8109f1f5>] ? check_preempt_curr+0x75/0xa0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450035] [<ffffffff810a2742>] ? default_wake_function+0x12/0x20 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450038] [<ffffffff810b5436>] ? autoremove_wake_function+0x16/0x40 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450044] [<ffffffff8116b3a0>] do_writepages+0x20/0x40 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450046] [<ffffffff81160229>] __filemap_fdatawrite_range+0x59/0x60 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450049] [<ffffffff81160abc>] filemap_flush+0x1c/0x20 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450051] [<ffffffff81258129>] ext4_alloc_da_blocks+0x49/0xa0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450055] [<ffffffff81250d39>] ext4_release_file+0x79/0xc0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450059] [<ffffffff811d477d>] __fput+0xbd/0x250 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450061] [<ffffffff811d495e>] ____fput+0xe/0x10 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450066] [<ffffffff8108df37>] task_work_run+0xa7/0xe0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450072] [<ffffffff81013e67>] do_notify_resume+0xc7/0xd0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450075] [<ffffffff817859ea>] int_signal+0x12/0x17 
Sep 13 21:24:45 arh-ibstorage2-ib kernel: [11894.974467] perf interrupt took too long (5003 > 5000), lowering kernel.perf_event_max_sample_rate to 25000 

Has anyone notices similar behavior? Are there any cases where people use rbd mapped images in production? 

Thanks 

Andrei 

_______________________________________________ 
ceph-users mailing list 
ceph-users at lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140914/ad7f0faf/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux