qemu-kvm guests hang on disk write with rbd storage

Qiang <wangqiang.hunan@xxxxxxxxx> · Tue, 28 Oct 2014 21:32:37 +0800

Hi, Dear All

I got an issue in my environment: qemu-kvm guests hang on disk write 
with rbd storage.

My environment:
ceph version: 0.80.7
ceph osds: 11(hosts) * 10(osd) = 110
qemu version: 2.0 +

my operating steps:
ceph osd crush add-bucket ssd root
ceph osd getcrushmap -o mycrushmap
crushtool -d mycrushmap -o mycrushmap_v1

#modify mycrushmap_v1
#add 4 of 11 hosts into root=ssd .
#meanwhile the 11 hosts are still in root=default.

crushtool -c mycrushmap_v1 -o mycrushmap_input
ceph osd setcrushmap -i mycrushmap_input
After I doing above steps

In my environment, qemu-kvm VMs which attached ceph rbd storage all 
hung.  The kernel log shows:
kernel: INFO: task jbd2/sdb1-8:623 blocked for more than 120 seconds.
kernel: Not tainted 2.6.32-431.3.1.el6.x86_64 #1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
kernel: jbd2/sdb1-8 D 0000000000000001 0 623 2 0x00000000
kernel: ffff88011c44dc20 0000000000000046 ffff8801ffffffff 00000000cc70801d
kernel: ffff88011c44db90 ffff880119466980 00000000d127ef64 ffffffffac2de373
kernel: ffff880119538638 ffff88011c44dfd8 000000000000fbc8 ffff880119538638
kernel: Call Trace:

In the meantime the ceph.log shows everything working fine and the ceph 
health is ok. And The other guest VMs are fine which without ceph rbd 
storage.

I tried many times in my testing environment, But I cannot reproduce it. 
 So that maybe not a problem.

Is there any defect/bug relates to this issue?  Or any suggestion to 
help me find the root cause?

Thanks very much.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html