Hi Jan,
Thanks for the advice, hit the nail on the head.
I checked the limits and watched the no. of fd's and as it reached the soft limit (1024) thats when the transfer came to a grinding halt and the vm started locking up.
After your reply I also did some more googling and found another old thread:
I increased the max_files in qemu.conf and restarted libvirtd and the VM (as per Dan's solution in thread above), and now it seems to be happy copying any size files to the rbd. Confirmed the fd count is going past the previous soft limit of 1024 also.
Thanks again!!
Raf
On 2 September 2015 at 18:44, Jan Schermer <jan@xxxxxxxxxxx> wrote:
1) Take a look at the number of file descriptors the QEMU process is using, I think you are over the limits
pid=pid of qemu process
cat /proc/$pid/limits
echo /proc/$pid/fd/* | wc -w
2) Jumbo frames may be the cause, are they enabled on the rest of the network? In any case, get rid of NetworkManager ASAP and set it manually, though it looks like your NIC might not support them.
Jan
> _______________________________________________
> On 02 Sep 2015, at 01:44, Rafael Lopez <rafael.lopez@xxxxxxxxxx> wrote:
>
> Hi ceph-users,
>
> Hoping to get some help with a tricky problem. I have a rhel7.1 VM guest (host machine also rhel7.1) with root disk presented from ceph 0.94.2-0 (rbd) using libvirt.
>
> The VM also has a second rbd for storage presented from the same ceph cluster, also using libvirt.
>
> The VM boots fine, no apparent issues with the OS root rbd. I am able to mount the storage disk in the VM, and create a file system. I can even transfer small files to it. But when I try to transfer a moderate size files, eg. greater than 1GB, it seems to slow to a grinding halt and eventually it locks up the whole system, and generates the kernel messages below.
>
> I have googled some *similar* issues around, but haven't come across some solid advice/fix. So far I have tried modifying the libvirt disk cache settings, tried using the latest mainline kernel (4.2+), different file systems (ext4, xfs, zfs) all produce similar results. I suspect it may be network related, as when I was using the mainline kernel I was transferring some files to the storage disk and this message came up, and the transfer seemed to stop at the same time:
>
> Sep 1 15:31:22 nas1-rds NetworkManager[724]: <error> [1441085482.078646] [platform/nm-linux-platform.c:2133] sysctl_set(): sysctl: failed to set '/proc/sys/net/ipv6/conf/eth0/mtu' to '9000': (22) Invalid argument
>
> I think maybe the key info to troubleshooting is that it seems to be OK for files under 1GB.
>
> Any ideas would be appreciated.
>
> Cheers,
> Raf
>
>
> Sep 1 16:04:15 nas1-rds kernel: INFO: task kworker/u8:1:60 blocked for more than 120 seconds.
> Sep 1 16:04:15 nas1-rds kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 1 16:04:15 nas1-rds kernel: kworker/u8:1 D ffff88023fd93680 0 60 2 0x00000000
> Sep 1 16:04:15 nas1-rds kernel: Workqueue: writeback bdi_writeback_workfn (flush-252:80)
> Sep 1 16:04:15 nas1-rds kernel: ffff880230c136b0 0000000000000046 ffff8802313c4440 ffff880230c13fd8
> Sep 1 16:04:15 nas1-rds kernel: ffff880230c13fd8 ffff880230c13fd8 ffff8802313c4440 ffff88023fd93f48
> Sep 1 16:04:15 nas1-rds kernel: ffff880230c137b0 ffff880230fbcb08 ffffe8ffffd80ec0 ffff88022e827590
> Sep 1 16:04:15 nas1-rds kernel: Call Trace:
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff8160955d>] io_schedule+0x9d/0x130
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812b8d5f>] bt_get+0x10f/0x1a0
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812b90ef>] blk_mq_get_tag+0xbf/0xf0
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812b4f3b>] __blk_mq_alloc_request+0x1b/0x1f0
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812b68a1>] blk_mq_map_request+0x181/0x1e0
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812b7a1a>] blk_sq_make_request+0x9a/0x380
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812aa28f>] ? generic_make_request_checks+0x24f/0x380
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812aa4a2>] generic_make_request+0xe2/0x130
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff812aa561>] submit_bio+0x71/0x150
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffffa01ddc55>] ext4_io_submit+0x25/0x50 [ext4]
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffffa01dde09>] ext4_bio_write_page+0x159/0x2e0 [ext4]
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffffa01d4f6d>] mpage_submit_page+0x5d/0x80 [ext4]
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffffa01d5232>] mpage_map_and_submit_buffers+0x172/0x2a0 [ext4]
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffffa01da313>] ext4_writepages+0x733/0xd60 [ext4]
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff81162b6e>] do_writepages+0x1e/0x40
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff811efe10>] __writeback_single_inode+0x40/0x220
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff811f0b0e>] writeback_sb_inodes+0x25e/0x420
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff811f0d6f>] __writeback_inodes_wb+0x9f/0xd0
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff811f15b3>] wb_writeback+0x263/0x2f0
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff811f2aec>] bdi_writeback_workfn+0x1cc/0x460
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff8108f0ab>] process_one_work+0x17b/0x470
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff8108fe8b>] worker_thread+0x11b/0x400
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff8108fd70>] ? rescuer_thread+0x400/0x400
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff81613cfc>] ret_from_fork+0x7c/0xb0
> Sep 1 16:04:15 nas1-rds kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
>
>
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Rafael Lopez
Data Storage Administrator
Servers & Storage (eSolutions)
Servers & Storage (eSolutions)
+61 3 990 59118
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com