Hi ceph-users,
Hoping to get some help with a tricky problem. I have a rhel7.1 VM guest (host machine also rhel7.1) with root disk presented from ceph 0.94.2-0 (rbd) using libvirt.
The VM also has a second rbd for storage presented from the same ceph cluster, also using libvirt.
The VM boots fine, no apparent issues with the OS root rbd. I am able to mount the storage disk in the VM, and create a file system. I can even transfer small files to it. But when I try to transfer a moderate size files, eg. greater than 1GB, it seems to slow to a grinding halt and eventually it locks up the whole system, and generates the kernel messages below.
I have googled some *similar* issues around, but haven't come across some solid advice/fix. So far I have tried modifying the libvirt disk cache settings, tried using the latest mainline kernel (4.2+), different file systems (ext4, xfs, zfs) all produce similar results. I suspect it may be network related, as when I was using the mainline kernel I was transferring some files to the storage disk and this message came up, and the transfer seemed to stop at the same time:
Sep 1 15:31:22 nas1-rds NetworkManager[724]: <error> [1441085482.078646] [platform/nm-linux-platform.c:2133] sysctl_set(): sysctl: failed to set '/proc/sys/net/ipv6/conf/eth0/mtu' to '9000': (22) Invalid argument
I think maybe the key info to troubleshooting is that it seems to be OK for files under 1GB.
Any ideas would be appreciated.
Cheers,
Raf
Sep 1 16:04:15 nas1-rds kernel: INFO: task kworker/u8:1:60 blocked for more than 120 seconds.
Sep 1 16:04:15 nas1-rds kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 1 16:04:15 nas1-rds kernel: kworker/u8:1 D ffff88023fd93680 0 60 2 0x00000000
Sep 1 16:04:15 nas1-rds kernel: Workqueue: writeback bdi_writeback_workfn (flush-252:80)
Sep 1 16:04:15 nas1-rds kernel: ffff880230c136b0 0000000000000046 ffff8802313c4440 ffff880230c13fd8
Sep 1 16:04:15 nas1-rds kernel: ffff880230c13fd8 ffff880230c13fd8 ffff8802313c4440 ffff88023fd93f48
Sep 1 16:04:15 nas1-rds kernel: ffff880230c137b0 ffff880230fbcb08 ffffe8ffffd80ec0 ffff88022e827590
Sep 1 16:04:15 nas1-rds kernel: Call Trace:
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff8160955d>] io_schedule+0x9d/0x130
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812b8d5f>] bt_get+0x10f/0x1a0
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812b90ef>] blk_mq_get_tag+0xbf/0xf0
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812b4f3b>] __blk_mq_alloc_request+0x1b/0x1f0
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812b68a1>] blk_mq_map_request+0x181/0x1e0
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812b7a1a>] blk_sq_make_request+0x9a/0x380
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812aa28f>] ? generic_make_request_checks+0x24f/0x380
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812aa4a2>] generic_make_request+0xe2/0x130
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812aa561>] submit_bio+0x71/0x150
Sep 1 16:04:15 nas1-rds kernel: [<ffffffffa01ddc55>] ext4_io_submit+0x25/0x50 [ext4]
Sep 1 16:04:15 nas1-rds kernel: [<ffffffffa01dde09>] ext4_bio_write_page+0x159/0x2e0 [ext4]
Sep 1 16:04:15 nas1-rds kernel: [<ffffffffa01d4f6d>] mpage_submit_page+0x5d/0x80 [ext4]
Sep 1 16:04:15 nas1-rds kernel: [<ffffffffa01d5232>] mpage_map_and_submit_buffers+0x172/0x2a0 [ext4]
Sep 1 16:04:15 nas1-rds kernel: [<ffffffffa01da313>] ext4_writepages+0x733/0xd60 [ext4]
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff81162b6e>] do_writepages+0x1e/0x40
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff811efe10>] __writeback_single_inode+0x40/0x220
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff811f0b0e>] writeback_sb_inodes+0x25e/0x420
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff811f0d6f>] __writeback_inodes_wb+0x9f/0xd0
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff811f15b3>] wb_writeback+0x263/0x2f0
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff811f2aec>] bdi_writeback_workfn+0x1cc/0x460
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff8108f0ab>] process_one_work+0x17b/0x470
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff8108fe8b>] worker_thread+0x11b/0x400
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff8108fd70>] ? rescuer_thread+0x400/0x400
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff81613cfc>] ret_from_fork+0x7c/0xb0
Sep 1 16:04:15 nas1-rds kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com