Hi, I'm running a 3 node cluster with 126 OSDs in total under CentOS-6.5 with ceph version 0.83 (78ff1f0a5dfd3c5850805b4021738564c36c92b8) On the client side it's 0.83, too with kernel 3.16.0-1.el6.elrepo.x86_64 rbd showmapped id pool image snap device 0 SAS-r2 sas2-r2-1T-4m.0 - /dev/rbd0 1 SAS-r2 sas2-r2-1T-4m.1 - /dev/rbd1 2 SAS-r2 sas2-r2-1T-4m.2 - /dev/rbd2 After a couple of minutes (trying to fill the 1TB volume) fio --filename=/dev/rbd0 --direct=1 --rw=write --bs=8M --size=8G --numjobs=128 --offset_increment=8G --runtime=3600 --group_reporting --name=file1 got stuck. /var/log/message: (...) Aug 7 19:22:34 rx37-0 kernel: libceph: osd118 192.168.113.54:6902 socket closed (con state OPEN) Aug 7 19:22:34 rx37-0 kernel: libceph: osd40 192.168.113.52:6920 socket closed (con state OPEN) Aug 7 19:22:34 rx37-0 kernel: libceph: osd109 192.168.113.54:6875 socket closed (con state OPEN) Aug 7 19:22:34 rx37-0 kernel: libceph: osd67 192.168.113.53:6875 socket closed (con state OPEN) Aug 7 19:22:34 rx37-0 kernel: libceph: osd37 192.168.113.52:6911 socket closed (con state OPEN) Aug 7 19:22:34 rx37-0 kernel: libceph: osd98 192.168.113.54:6842 socket closed (con state OPEN) Aug 7 19:22:34 rx37-0 kernel: libceph: osd26 192.168.113.52:6878 socket closed (con state OPEN) Aug 7 19:24:43 rx37-0 kernel: INFO: task kworker/2:0:19 blocked for more than 120 seconds. Aug 7 19:24:43 rx37-0 kernel: Not tainted 3.16.0-1.el6.elrepo.x86_64 #1 Aug 7 19:24:43 rx37-0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 7 19:24:43 rx37-0 kernel: kworker/2:0 D 0000000000000002 0 19 2 0x00000000 Aug 7 19:24:43 rx37-0 kernel: Workqueue: ceph-msgr con_work [libceph] Aug 7 19:24:43 rx37-0 kernel: ffff8810307bfb68 0000000000000046 ffff8810307bfb18 ffff8810307bc010 Aug 7 19:24:43 rx37-0 kernel: 0000000000014380 0000000000014380 ffff8810307ae390 ffff880079678250 Aug 7 19:24:43 rx37-0 kernel: 0000003500004040 ffff88102a1fd7c8 ffff88102a1fd7cc ffff8810307ae390 Aug 7 19:24:43 rx37-0 kernel: Call Trace: Aug 7 19:24:43 rx37-0 kernel: [<ffffffff81647629>] schedule+0x29/0x70 Aug 7 19:24:43 rx37-0 kernel: [<ffffffff8164778e>] schedule_preempt_disabled+0xe/0x10 Aug 7 19:24:43 rx37-0 kernel: [<ffffffff816490fb>] __mutex_lock_slowpath+0xdb/0x1d0 Aug 7 19:24:43 rx37-0 kernel: [<ffffffff81649213>] mutex_lock+0x23/0x40 Aug 7 19:24:43 rx37-0 kernel: [<ffffffffa0615e0f>] get_reply+0x3f/0x200 [libceph] Aug 7 19:24:43 rx37-0 kernel: [<ffffffffa0616058>] alloc_msg+0x88/0x90 [libceph] Aug 7 19:24:43 rx37-0 kernel: [<ffffffffa060d8f1>] ceph_con_in_msg_alloc+0x71/0x240 [libceph] Aug 7 19:24:43 rx37-0 kernel: [<ffffffffa060eba8>] read_partial_message+0x1e8/0x3d0 [libceph] Aug 7 19:24:43 rx37-0 kernel: [<ffffffffa060d278>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph] Aug 7 19:24:43 rx37-0 kernel: [<ffffffffa06101d6>] try_read+0x2b6/0x430 [libceph] Aug 7 19:24:43 rx37-0 kernel: [<ffffffffa0610688>] con_work+0x78/0x220 [libceph] Aug 7 19:24:43 rx37-0 kernel: [<ffffffff8108d60c>] process_one_work+0x17c/0x420 Aug 7 19:24:43 rx37-0 kernel: [<ffffffff8108e7d3>] worker_thread+0x123/0x420 Aug 7 19:24:43 rx37-0 kernel: [<ffffffff8108e6b0>] ? maybe_create_worker+0x180/0x180 Aug 7 19:24:43 rx37-0 kernel: [<ffffffff810943be>] kthread+0xce/0xf0 Aug 7 19:24:43 rx37-0 kernel: [<ffffffff810942f0>] ? kthread_freezable_should_stop+0x70/0x70 Aug 7 19:24:43 rx37-0 kernel: [<ffffffff8164ae3c>] ret_from_fork+0x7c/0xb0 Aug 7 19:24:43 rx37-0 kernel: [<ffffffff810942f0>] ? kthread_freezable_should_stop+0x70/0x70 Aug 7 19:24:43 rx37-0 kernel: INFO: task kworker/3:0:24 blocked for more than 120 seconds. Aug 7 19:24:43 rx37-0 kernel: Not tainted 3.16.0-1.el6.elrepo.x86_64 #1 Aug 7 19:24:43 rx37-0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 7 19:24:43 rx37-0 kernel: kworker/3:0 D 0000000000000003 0 24 2 0x00000000 Aug 7 19:24:43 rx37-0 kernel: Workqueue: ceph-msgr con_work [libceph] Aug 7 19:24:43 rx37-0 kernel: ffff881030027c98 0000000000000046 ffff881019afe330 ffff881030024010 (...) Any ideas ? With Kernel 3.10.32 on the client side everythink worked fine. Mit freundlichen Grüßen / Best regards Dieter Kasper -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html